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REMARKS 

Claims 1-25, 27-32, and 44 are pending; claims 1, 9, 14, 19, and 27 have been amended. 

Claim 1, 9, 14, and 19 have been amended to more particularly point out and distinctly 
claim Applicant's invention. Support for this amendment can be found throughout the 
specification, for example, at page 31, line 1 1 through page 34, line 12. 

Claim 27 has been amended to conform the claim with claim 24, from which it depends. 
Support for this claim amendment can be found throughout the specification, for example, at 
page 9, lines 8-9. 

The claims have also been amended to correct minor typographical errors. 

The rejections of the claims under 35 U.S.C. §§102 and 103 have been withdrawn. 

The above amendments of the claims are done without prejudice to further prosecution c 
other embodiments of this invention in a continuation, continuation-in-part, divisional, or other 
related application. None of the above amendments adds any new matter to the Application. 



/. Priority Application 

The Office Action has denied the Application the benefit of the May 14, 1998 filing date 
of the priority document because "as presently written, the full scope embraced by each claim 
was not disclosed in the provisional application." (Office Action, page 2). Specifically, the 
Office Action states that "the provisional application is essentially a research paper and discloses 
particular experiments performed. There is no generic disclosure of the methods as presently 
claimed." (Office Action, page 2). 

Applicant respectfully submits that she is entitled to priority to the May 14, 1998, date for 
as much as what is disclosed and fully supported by the priority document. 

The priority document teaches that normal huntingtin was associated with MLK2, that 
expression of MLK2 activated the SEK1-JNK pathway and induced apoptosis in neuronal cells, 
and that co-expression of MLK2 with mutated huntingtin induced toxicity in non-neuronal cells 
(293 embryonic kidney cells) while co-expression of MLK2 with normal huntingtin did not (see 
priority document at page 2). The priority document teaches that neuronal toxicity was induced 
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by either mutated huntingtin or by MLK2, and that this toxicity could be attenuated by a 
dominant negative SEK1 (see priority document at page 8). The priority document later states, 
"normal huntingtin is associated with.MLK2 in intact cells and such association dos not generate 
any cell toxicity" (priority document at page 9). 

Applicant avers that upon reading the specification, the ordinarily skilled artisan would 
understand that (1) activated MLK2 activity induced neuronal cell toxicity; (2) MLK2 activity 
and its ability to induce cell death can inhibited by co-expression of normal huntingtin. 
Applicant avers that upon reading the specification, the ordinarily skilled artisan would 
understand that a compound (such as normal huntingtin) has an ability to inhibit neuronal cell 
death if a neuronal cell with activated MLK activity does not die when contacted with the 
compound since MLK is a constitutively active kinase whose activity is held in check only 
because MLK associates with normal huntingtin (see priority document at pages 6-7 and Figures 
4B and 4C). The priority document teaches the ordinarily skilled artisan that compounds other 
than normal huntingtin can be similarly assessed for their ability to inhibit neuronal cell death. 
Thus, the priority document discloses the full scope of the cell culture based methods of the 
claimed invention. 

Furthermore, Applicant posits that the priority document also discloses the full scope of 
the cell free based methods of the claimed invention. As described in the priority document at 
page 5, MLK is a protein kinase that directly binds to and activates SEK1. Expression of mutant 
huntingtin in MLK-expressing neuronal cells induced cell death; however, when a dominant 
negative form of SEK1 was co-expressed with mutated huntingtin in MLK-expressing neuronal 
cells, death of the cells was blocked (see priority document at page 5). Thus, one of ordinary 
skill in the art, upon reading the priority document, would understand that active MLK's kinase 
activity in binding to and activating SEK1 results in cell death — accordingly, the ordinarily 
skilled artisan would understand that a compound (such a normal huntingtin) that inhibits MLK 
kinase activity and therefore inhibits phosphorylation and activation of a MLK substrate (e.g., 
SEK1) would also inhibit neuronal cell death. Accordingly, Applicant avers that the priority 
document discloses the full scope of the cell free based methods of the claimed invention. 
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Given the broad teachings of the priority document, and the knowledge of the ordinarily 
skilled artisan as of the priority document's filing date, Applicant avers that the Application is 
entitled to the May 14, 1998 filing date of the priority document, since the priority document 
disclosed the full scope of the claimed invention. 

//. Rejection Under 35 U.S.C. §112, first paragraph. 

Claims 1-25, 27-32, and 44 stand rejected under 35 U.S.C. §1 12, first paragraph, because 
the specification "does not reasonably provide enablement for the breadth of the claims as 
written" (Office Action, page 3). Specifically, the Office Action states that "the specification 
does not describe nor enable performing the claimed methods in an intact mammal." (Office 
Action, page 4). 

Applicant has overcome this ground for rejection by amendment to the claims, as 
supported by the remarks made below. 

As an initial matter, Applicant has amended claims 1, 9, 14, and 19 to clarify that the cell 
culture based methods are what was intended to be covered by these claims. This being the case, 
Applicant respectfully avers that the specification fully enables the claims as presently amended 
because the ordinarily skilled artisan would have known, upon reading the specification, that the 
claimed methods allow an assessment of a compound's ability to inhibit MLK activity, and 
therefore inhibit neuronal cell death. As to claim 24, and the claims dependent thereon, which 
cover in vitro cell-free methods, Applicant likewise avers that the specification fully enables the 
ordinarily skilled artisan to practice these claimed cell-free methods to assess a compound's 
ability to inhibit MLK activity, and therefore inhibit neuronal cell death. 

As to the assertion that the claimed methods allow an assessment of compounds that 
inhibit (and not necessarily prevent) neuronal cell death, where death (or prevention thereof) of a 
neuronal cell is concerned, Applicants avers that the ordinarily skilled artisan would understand 
that "inhibit" and "prevent" are words that can be used interchangeably — any compound 
assessed as having an ability to inhibit cell death would also prevent cell death, and vice versa. 
The specification has described, in Example I (specification, page 24, line 21 through page 27, 
line 13), a cell culture based method for studying an exemplary neurodegenerative disease, 
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namely Huntington's disease. Applicant avers that the ordinarily skilled artisan, upon reading 
Example I, would understand that any compound assessed by the methods of the invention as 
having an ability to inhibit death of cultured neuronal cells induced to apoptose by expression of 
polyglutamine-expanded huntingtin would also inhibit death of in vivo neuronal cells induced to 
apoptose by expression of polyglutamine-expanded huntingtin. 

Moreover, Applicant avers that the ordinarily skilled artisan would understand that using 
the methods of the invention, an assessment may be made of a compound's ability to inhibit 
neuronal cell death in general, regardless of whether or not the neuronal cell death is associated 
with Huntington's disease, Alzheimer's disease, or any other neurological condition. That the 
specification lists many different types of neurological conditions (see specification at page 13, 
line 6 through page 14, line 2) merely evidences that the claimed invention is meant to embrace 
any and all of these diseases, with Huntington's disease and Alzheimer's disease being but 
examples of the types of neurological conditions covered by the claims. 

The Office Action further states that because the specification "does not appear to 
specifically define the metes and bound of the intended activities [sic], proteins, and activities", 
"the specification does not describe or enable identification of any other MLK proteins or 
activities meeting the functional limitations of the claims and it is deemed to constitute undue 
experimentation to determine them." (Office Action, page 5). 

Applicant avers that the specification has provided sufficient guidance for identifying 
MLK proteins and activities for use in the claimed invention. 

First of all, Applicant notes that the Application does not claim MLK proteins, MLK 
nucleic acid molecules, or methods of producing MLK proteins or nucleic acid molecules. 
Rather, as is explained below, the Application claims using an MLK protein in a method for 
preventing neuronal cell death, where one of ordinarily skill in the art, without undue 
experimentation, could identify an MLK protein based upon the teachings of the specification. 
Accordingly, the case cited by the Office Action, In re Maizeh 27 USPQ2d 1662 is not 
appropriately applied since In re Maizel concerns the lack of enablement of a claim covering 
DNA and vectors containing the DNA. 
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As to identifying MLK proteins useful in the claimed invention, Applicant respectfully 
avers that such identification would be routine to one of ordinary skill in the art upon reading the 
specification. To buttress her position, Applicant respectfully directs the Examiner's attention to 
page 499, left column, of Dorow et al., Eur. J. Biochem. 234: 492-500 (1995) ("Dorow"; 
provided herewith as Appendix A; incorporated by reference into the specification (see page 3, 
line 2 and page 24, lines 16-17)). There, Dorow teaches that MLK2 expression is observed in 
brain, skeletal muscle, and in the pancreas, and that MLK3 expression is observed in most cell 
lines and tissues examined. Moreover, Applicant notes that MLK2, when co-expressed with 
mutated huntingtin, induced apoptosis in embryonic kidney cells (see specification at page 33, 
lines 15-19). 

Thus, while the specification states at page 10, line 13 that MLK2 "is a neuronal form of 
MLKs" (emphasis added), Applicant avers that the ordinarily skilled artisan would understand 
that MLK2 may not be the only MLK in neuronal cells, particularly given the teaching by Dorow 
that MLK2 is also found in colonic cells. The. specification teaches that MLKs "are the only 
known kinases that directly activate the SEK1-JNK cascade and contain a SH3 domain as well as 
a SH3 domain binding site." (page 10, lines 1 1-12) Given this teaching, Applicant avers that the 
skilled artisan would understand that a kinase which (1) contains an SH3 domain, (2) contains an 
SH3 domain binding site, and (3) directly activates the SEK1-JNK cascade is a MLK within the 
scope of the claims. Given the teachings of the specification that demonstrate that an examplary 
MLK, MLK2, directly binds to and phosphorylates SEK1 (see, e.g., page 36, lines 4-28), no 
further undue experimentation is required on the part of the ordinarily skilled artisan to make the 
determination of which MLK proteins are encompassed by the scope of the claims. 

Additionally, Applicant avers that one of ordinary skill in the art would understand what 
the metes and bounds of intended activities of MLK are. The specification, at page 9, lines 8-10, 
states that MLK is activated by being bound by an SH3 domain on a triggering protein. 
Activated MLK then directly binds to and stimulates a SEK1 protein. While it is true that 
activated MLK has enzymatic activity (e.g., kinase activity), activated MLK also has other 
activities, including activated MLK's ability to bind SEK1 (see, e.g., page 12, lines 24-25). 
Those of ordinary skill at the time the Application was filed will understand that an "activated" 



-8- 




Serial No. 09/156,367 
Art Unit: 1631 

Examiner: Marianne P. Allen 

protein need not be enzymatically active, and a protein that is enzymatically active may also be 
activated by a conformation change revealing or hiding a regulatory domain. Thus, MLK protein 
activity includes, without limitation, a kinase activity and an ability to bind SEKL Applicant 
avers that the ordinarily skilled artisan would understand that it matters not how MLK is 
activated to induce apoptosis of neuronal cell, only that when MLK is activated (in the presence 
of appropriate stimuli), neuronal cell death results. Of course, activated MLK activity will 
increase or decrease depending upon increased or decreased rates of MLK transcription and/or 
translation (see specification, page 12, lines 20-21). 

Given its teachings, Applicant avers that the specification does not merely provide an 
invitation to experiment analogous to that in Genentech Inc. v. Novo Nordisk A/S. 42 USPQ2d 
1001, as has been asserted by the Office Action (see page 6). Rather, Applicant avers that the 
specification describes a model for neuronal cell death and teaches methods for utilizing an MLK 
protein to assess the ability of compounds to inhibit neuronal cell death. As such, Applicant 
avers that the specification has fulfilled the requirements of 35 U.S.C. §1 12, first paragraph. 

The Office Action has stated that a "reasonable correlation must exist between the scope 
of the claims and the scope of enablement set forth." (Office Action, page 5). Applicant does 
not dispute this requirement but, rather, posits that the specification has met the requirements for 
enablement. To buttress her position, Applicant respectfully the Examiner's attention the 
specification at page 8, lines 23-28. There, the specification teaches that inhibition of MLK2 can 
protect a neuronal cell from apoptosis induced by polyglutamine-expanded huntingtin. As taught 
by the reference, Huntington's Disease Collaborative Research Group, Cell 72:971-983 (1993) 
(provided herewith as Appendix B), which is cited by the specification at page 8, line 28 and 
incorporated by reference (see specification page 24, lines 16-17), expression of polyglutamine- 
expanded huntingtin caused Huntington's Disease in humans. As the specification summarizes at 
page 32, line 15, "MLK-associated activity was involved in neuronal loss in Huntington's 
diseases." 

The specification teaches, at page 24, line 21 through page 27, line 13, a cell culture 
based model for studying polygluatmine-expanded huntingtin induced neurodegeneration was 
developed. Later, at page 31, line 11 through page 34, line 12, the specification teaches that an 
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examplary MLK protein, MLK2, is associated with huntingtin and that a mutant MLK2 protein 
lacking kinase activity blocks apoptosis induced, by glutamate of kainate receptor activation in 
neuronal cells. Still later, at page 34, line 14 through page 35, line 4 and page 36,line 3 through 
page 37, line 22, the specification describes the development of a cell free based model for 
studying neurodegeneration. 

Applicant avers that the specification as filed has met the standards of enablement as set 
forth in Manual of Patent Examining Procedures §2164.02 (7 th Edition, Rev. 1, Feb. 2000). 
There, under the section "Correlation: in vitro/in vivo", MPEP §2164.02 states "if the art is such 
that a particular model is recognized as correlating to a specific condition, then it should be 
accepted as correlating unless the examiner has evidence that the model does not correlate. Even 
with such evidence, the examiner must weight the evidence for an against correlation and decide 
whether one skilled in the art would accept the model as reasonably correlating to the condition." 
Applicant posits that one of ordinary skill in the art would have understood that the in vitro cell 
culture model and cell free model for neurodegenerative disease set forth in the specification 
correlates with in vivo neurological conditions, including those described in the specification 
(see, e.g., page 13, line 6 through page 14, line 2). The standards for enablement of 35 U.S.C. 
§112, first paragraph, require nothing more. 

Accordingly, Applicant respectfully requests the grounds for this 35 U.S.C. §112, first 
paragraph, be reconsidered and withdrawn. 

///. Conclusion. 

Applicants posit that the presently maintained rejections of the pending claims have been 
fully overcome by amendment and/or argument. Accordingly, Applicants respectfully submit 
that the pending claims are in condition for allowance. If the Examiner believes that any further 
discussion of this communication would be helpful, she is encouraged to contact the undersigned 
by telephone. 
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Aside from the fee for the two month extension of time, no fees are believed to be due 
connection with this communication. However, please apply any additional charges, or credit 
any overpayment, to our Deposit Account No. 08-0219. 



Hale and Dorr LLP 
60 State Street 
Boston, MA 02109 
617 526-6110 (Telephone) 
617 526-5000 (Fax) 

Dated: November 30, 2000 



Respectfully submitted, 




Hollie L. Baker 
Registration No. 31,321 
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Complete nucleotide sequence, expression, and chromosomal localisation 
ot human mixed-lineage kinase 2 

Donnu S. DOROW, Liso DEVEREUX 1 . Guo-fcn TU" Gareih PRirF 1 nr v xr,o..^ , 

and Richard J. SIMPSON 2J ' * ICE ' ,,ll,an K N 'CHOLL'. Gram R. SUTHERLAND* 

; Research Division The Peter MacCallum Cancer Institute Melbourne, Victoria, Austnlia 
The Joint Protein Structure Laboratory, Ludwjo Institute for Can^r /x/nT n 

Cc„,„ ,„„„,«., 0„„c, * c wt ,;„ „ „„«„,„, c^rtc,, «, H„,p iul . M „. a , Au „„ lia 

(Received 12 June 1995) - EJB 95 0959/3 

Protein kinases play pivotal roles in the control of manv cellular nnv«»« t„ , , u , 
kinases expressed in human epithelial tumour ^iu^w-T i processes. In a search for protein 
family [Dorow. D. S D^Sx L J iZ d ' scov «ed two members of a novel protein kinase 

S^ed^^ 

MLK2 protein. t£ predict m\kT P o y^ S^no^S^ »t 
(SH3) domain. a kinase catalytic domain Vlo^ ^ homology 3 

terminal domain. The 22-amino-acid N-terminal « 2 £ Z\ Z^i^^™'"' > ,a f f 
fo!!ow in g ii, e initiator methionine. Beginnin" at amino nad'W'Z ?VT"' W ™T lmmedl:lteI y 
a 5-amino-acid insert in a position comSdL^ • 23 the 55-ammo-actd SH3 domain contains 
n-,r C and the phosplJSdXo iS ?S ir? inseR5 k of c 6 » d 15 '» id «« " *e SH3 domains of 
withconscrved'motifc^ 10 thc J" 3 do ™ in * » kinase catalytic domain 

dues C-termina. to the 22$ c^ntn^^^ 0 ™ *°* imia * " ine 

acid spacer sequence and followed by s L7h I f™ Z ' PP ° rS separatcd b y a 13-*nino- 

motif that is similar to nuclear bcalisatio^ I f ThC P °' ybasic sequencc contains * 

composed of 491 amino aci* o( w 5 > i? ' ^ f^' Pr °' dnS The C ' 1 "^ *>™" is 
a.so has a biased ratio of basictj ac'dic" ^l^^^J^^ 7* *"? 

used for nuoresccncf , 

Keywords: protein kinase; mi*ed-,inea S e kinase; leucine zipper: SH3 domain; DNA sequence. 

c^'^^Xtuha'SS ^iS'S «£,ut ? ? ^ T' "T^™ 31 pr0pefti « < Ha <** « 

growth and d.vision (D'Urso ct a|., 1990- Birchmeier e ,1 , , ? 8) ' M °* 1 pr0Ie,n kinase fami! V m «mbcrs also share struc- 

993). They serve as growth factor receptors ^ , -' K™"' ** ^ ** rtflect their pa " - 

ducers and have been implicated in cellular M^oTL I n" r0i "' TheS6 include ci °° uhl W do ™ ir >* that control 

ma hgnancy (Hunter and Karin. 1992; Posada and Coo^e ,992 rZTT" °7 meraction with othcr P"** (Hanks. 1991). 

Hunter and Pmes. 1994). Protein kinases can be divided into S '* elcmcnts " ori » inal 'y idcmified as conserved se- 

two mam groups both by amino acid sequencc simil S and "T ln ™ mbcrS of «»c ^-related kinase family, are the * 

SSS'of dull 6ith T Seri " e/(hreo "- « tyrosine. A m ^ 3 (SJ3) domains (SadowskU. al., 1986: 

S dual - s P«ificity k.nases are structurally like the scr- ?, f f " l99,) ' The " domains have no w been found in a 

SSTSTT*? g r p - Within * e broad dassifi a ion ^ f T? ins invol ^ d j " -^cellular signalling pathways 

?C K h ? rthCf sub " divided i«W families whose mcmbei ^ lh ? D lmk aclivaicd cel1 surfa <* scepters to downstream 

fh^tsher degree of catalyt.c domain amino add sequence itkcl0 ^™ and Gish, ,992). SH3 domains are also found 

CaUum Cancer ^<,:u lc Melbourne. Victoria 5000 Australia l^) Wh .' k /H2 domams bmd phosphorylated tyrosines in the 

Fox: +61 3 9656 1411. cytoplasmic domains of activated receorors SW^a • J a 

io^^S^Jt^^ kin3SeW: SH - - «»»««*«y h Pr ° lir K* riCh SCqUenCCS in thEir -o P lecu'.erAddi"ora^o. n et 
W oSS^"5S,li£ ^^"^ p^£ri S ? 8M .". d . fcr f 3 ^ «Wi"S localisation of 

^. The „ove. nuCcotidc dTu, published her, hi k 2 „ ' Chc | , V,C,n ' ty of the «» m ^brane where the early 

deposited with the EMBL sequent databan/and te ^ i T " (B °° kCT Ct 0l - ,995 - Rod ^y et aL 

accessor, number X90846. " ^ " VJlUble undcf ,989: Bdr - 1>a S' « al.. ,993). Furthermore it has rec-ntlv been 

snown th at SH3 domains also participate , n rc5a.S11.e aSS 



Dorow e: al. (Eur. / Biochtm. 234) 



jiy of both protein kinases (Superti-Furga et al.. 1993) and gua- 
Bosinctriphosphaiase effector proteins (Gout ct al.. 1993). 

Another regulatory domain, usually found in transcription 
factors such as the oncogenes fox. jun, and myc, is the leucine 
upper (Landschultz et al.. 1988). Leucine zipper sequences, con- 
taining u leucine or isoleucinc residue at every seventh position 
for a stretch of at least 22 amino acids, take up amphipathic 
helical conformations with (he leucine side chains forming a 
stripe down one face. In the transcription factors, the leucine 
zipper is preceded by a stretch of basic amino acids that consti- 
tute the DNA-bi riding region. Leucine zippers promote dimer- 
iiation though hydrophobic interactions between heptad leucines 
(O'Shea ct al.. 1991). Such dimerization appears to activate 
DNA-binding by orientating the basic side chains of DNA-bind- 
ing residues to enable correct contact with DNA (Vinson et al.. 
1989). While leucine zippers are not commonly associated with 
protein kinases, the cyclic-GMP-dependent kinase has a leucine 
zipper through which it forms its active state dimer (Wolfe et 
al., 1989). 

In a previous report (Dorow et al.. 1993). we described a 
novel protein kinase family, the mixed-lineage kinases (MLK). 
These kinases have an unusual catalytic domain structure that is 
a hybrid between the tyrosine and serine/threonine-specific 
types. In addition, they posses a unique double leucine /ipcer 
and basic domain that has only been found in members of the 
MLK family. We first reported partial amino acid sequences for 
two members of this family, MLK1 and 2. Further studies re- 
vealed that each of these proteins contains a SH3 domain 

(Dorow et a!.. 1994 r rWmi/ n ,,« n „ui:,u.j .. > ~ 

• • • — j/»«/u.»ireu iciuut,;. Recently, 

iwo further members of the MLK family have been reported 
One of these. MLK3 (Ing et al.. 1994). also reported as PTKl 
(fczoe et al., 1994) and SPRK (Gallo et al., 1994). is very closely 

Ti"\£ S?* K1 3nd 2 - A fourlh '-™re distantly related member 
of the MLK family, DLK (Holzman ct al.. 1994), contains MLK- 
like catalytic and double leucine zipper domains but lacks a SH3 
domain Thus, the MLK enzymes are an emerging family of 
protein kinases with a unique mixture of structural domains In 
addit.on to the unusual nature of their kinase and double leucine 
zipper domains, they arc the only protein kinases thus tar re- 
ported that contain SH3 domains in the absence of SH2 do- 
mains. In the present study, we describe the complete nucleotide 
^equence. tissue expression, and chromosomal localisation of 
1V1LK2 from human brain. 



MATERIALS AND METHODS 

Cloning and sequence analysis. Segments of cDNAs en- 
coding catalytic subdomains of protein kinases expressed in the 
v-pithchal tumour cell line Colo 16 (Moore et al., 1975) were 
2, RNA by reverse transc ript«e PCR by the 

Sf blCC " aL (1989) - D£ g< nera * PCR primers were 
5? ° n *? T uences encoding conserved motifs in subdomains 

eel?? Ti! WiIkS ' 1989) ° f the e P"-na|.growth.factor rl 
«ptor family kinase catalytic domains (Hanks et al. ]988) Se- 

a n Ue 3 l e L° f » the J', rimerS WWe 35 fo,lows; forward P"mcr (with 

Ji^^S^jyS?^ 5 '- C &iaKXCTG(A)CAC. 
t-(A)GT(CG)G(A)ACC(T)T; reverse primer (with an added 

rr?l,T " ndcrlined ) 5'<SGAAHCACCA(G)TAA(G)CT. 
Mn2 C)ACATC ; SCVCnl PcT ^-« *cV cloned £ 
^ n T^^'" 8 3 77 S *P"-*>* sequencing kit (Bre- 
1 St ° nC , 2,6 : b P PCR P rod «« was used as a probe to screen 
a human colon igj l , c DNa library (Clontech). The library wa£ 

were iX^ > ? ™* <9 0nteeh catalogue). Four clones 

were isolated and che.r inserts sequenced. All of these clones 
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represented overlapping areas of the same sequence that wc des- 
ignated MLK1. One cDN'A from this screen was used as a probe 
to rescrcen the same library, and a 1034-bp cDNA was isolated 
and sequenced (Dorow « a |., 1993). Thc 10 34-bp MLK2 cDNa 
was then used as a probe to screen a human brain ;.gt 10 library 
Approximately 0.5X10' clones were screened over several 
screenings, and one 3454-bp clone was isolated. The insert from 
this clone was subcloned into P UC18 and sequenced on both 
strands using an Applied Biosystems 373 automated DNA se- 
quencer. Sequencing reactions were carried out with a Prism 
Ready Reaction dyedeoxy terminator cycle sequencing kit (Ap- 
plied Biosystems) according to manufacturer's instructions 
Temperature cycling was performed 00 a Pcrkin Elmer Gen- 

« A £ P P «. CR system 9600 - nc c y clin « Procedure was 15 s at 
93 C. 15 s at 50°C. and 4 min at 60°C for 25 cycles. The pro- 
cedure was modified for (G+C)-rich templates to include 5% 
(by vol.) dimethyl sulfoxide and a denaturing temperature of 
98 C. All chemicals were purchased from Sigma unless other- 
wise stated. 

Northcrn-blot analysis. A multi-tissue Northern (MTN) 
blot (2ug mRNA/lane) was purchased from Clontech and 
treated according to standard procedures supplied by the manu- 
facturer. Briefly, the blot was prehybridised in 5XNaCl/P/EDTA 
(0.75 M NaCI. 0.05 M NaH.POj and 5 mM EDTA) lOXDcnh- 
ardr's solution [1 % (mass/vol.) Ficol! (Pharmacia type 400)' 1 % 
(mass/vol.) polyvinylpyrrolidone, and [% (mass/vol.) b'sa] 
JOOug/ml sheared, denatured salmon sperm DNA 50% (by 
vol.) deiomsed formamide (Fluka) and 2% rma«/w,i \ enc 
4h at 42 »C. Probes were labelled with («P]dA or CTP (NEN 
Dupont. 3000 Ci/mM) by random priming (Fcinbeig and V 0 «el- 
sicin. 1983) to a specific activity of *10 a -10' cpm/ug, added 
to thc filters in prehybridisation solution, and incubated over- 
night at 42 °C. Final stringency washes were in 0.5xNaCl/Cit 

vol ) SDSw'Sic 07 ' P ' H 7 '°' 75 ^ NaC0 a " d ° 5% (maSS/ 
Fluorescence in situ hybridisation. The 3454-bp MLK2 
cDNA was labelled by nick-translation with biotin-14-d[ATPj 
(Cibco BRL) and hybridised in situ at a concentration of 10 ng/ 
M" to mctaphase chromosomes from two normal male donors 
Chromosomes were prepared from peripheral blood leucocytes 
by standard procedures following the method of Wheatcr and 
Koberts (1987). Fluorescence in siru hybridisation (FISH) was 
performed as described by Callen ct al. (1990) except that the 
chromosomes were stained before analysis with both propidium 
iodide and 4.6-d,amidino-2-phenylindole. Images of metaphase 
preparations were recorded usin; a Panasonic WV-RL600 CCD 
camera. 



RESULTS 

Cloning and nucleotide sequence analysis of MLK2 In a 
search for protein kinases expressed in human epithelial tumour 
eel s, we used a PCR strategy to amplify segments of kinase 
catalytic domains from epithelial celi RNAs. The primers were 

^ ,o n o^ d,n , S C ° nSerVed mo(ifs in ,he ""lytic domains 
(WUks 1989) of epidermal-growth-factor receptor family mem- 
bers. Several PCR products were cloned and sequenced. One 
PCR product that represented a novel kinase catalytic domain 
sequence was used to probe a human colonic cDNA library and 
several clones were isolated. These cDNAs encoded overlapping 

27 ° oq P ^1 S P T l ? kin3SC that We named ML* 1 (Dorow 
et al.. 199,). When the MLK1 cDNA was used to rescreen the 
same library, a 1034-bp cDNA fragment that had 65% nucleo- 
tide sequence identity to MLK1 was isolated. It was clear that 
toge her these two molecules represented a new family of pro- 
tein kinases. v 
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cgcgcggccaggccctcttagccctctgccgtttggggggcacgggtgaacctgcccccccactcccaccccgccccg 78 
ccccgcccgtacagacaaatcggaagggacgagcctgccctttgaaagggttt7ttttcttgctcctgcggagggcgc 156 
cccagccatggccctcaggacctccctagaccccgc agccactgccctccatcccggccgccggggcccgccctctgc 2 34 

atcccgccggcagcctgtgtgaagcggcctcccgcagcccc cggcccctcccccatgg 3 1 2 

AK-EWGTTPAGPVWT L-A_ V F D Y E A A "g " d""e E 3 A 
GCC.VAGGAGTGG GGCACGACCCCCGCGGGGCCCGTCTGGACCGCGGTGTTC GACTACGAGGCGGCGGGCGACGAGGAG 390 

— I*ZJC L bZ-S — G — D...-B-..Y.-Q V L 3 Q ft" .V S C D F p" w q t' 60 

CTGACCCTGCGGAGGGGCGA TCG CGTCCAGGTGCTTTC C CAAGAC TGTGCGGTCTCCGGCGACCAGGGCTGGTGCACC 469 

G — ,Q . P S C r"v* G V £ P. S~.*J X lZ! APGAPAAPAG66 

GGGC agctcccc AGCGGCCGCGTGGGCGTCTTCCCC AGC AACTACGJGGCCCCCGGCCCCCCCGCTGC ACCCGCGGGC 546 

CTCcSGCTfccCCcSGcIs A-f CC^CT?CCACG AGCTGcSGCTAGlGG5cA*rCATCGSTGTGGScG8cT?TG8c AAGGTC I \ I 

tItC&>c£cc£ct£gCgVcSc^ 

gccg&cSggtgt&cScgwwccc^^^ )U 

LNPPHLCLVMEYARGGALSRVLAGRR ISO 
CTCAACCCCCCACACC^CTGCCtAGTGATCGAGTATGCCCGCC^ Q $ 3 

VP PHVLVNWAVQVARGMNYI. HNDAPV ?1£ 
GTGCC ACCTC ACCTGCTCCTC AACTCGGCTCTGCaGCTGCCCCCCGGCAfcAACTACCTACAC AATGATGCCCCTGTG 3 3 § 

C&ATCATCC ACCGGG&ck aSgtScaTCAAC ATCCTGATCC^CclGGCCATCGAGAACcScAScCyCGCAGACACG \ 1 1 4 

gtgctcaagatcacggacttcc8cctcgc\:cgcg^ 

tgga¥gg?gc^ggaggttatccgtctct?cc^ct? ct?c aa^gcagtgatgtctggaScttcg§ggtgc^ccVgtgg ffllo 

GAGcfec^GA ?GGG^fecTCC&TACC^ \\%$ 

acgctgcccattccctccacgtSccc^^ 

cgccc agaW?cgStag^atcttgaacccW^g Hit 
fhslqedwklsK qhmfddBrtkekeB 39S 

TTCCACTCGCTGCAGGAAGACTGGAAGCTGGAGATTCAGCACATGTTTGATGACCTTCGGACCAAGGAGkAGGAGCTT 1 a S 2 

RSREEEB LRAAQEQRFQEEQHRP. RE0 424 

CGGAGCCGTGAGGAGGAGCTGCTGCGGGCGGCACACG AGCAGCGCTTCCAGGACGAGCAGCTGCGGCGGCGGGACCAG 1560 

EBAEREKDKVERELHI! LMCQLSQE 450 

GAGCTGGCAGAACGTGACATGGACATCGTGGA.^CG^^rT^CAr^OCTCATGTCCCACCTCAGCCAGG«GrA^c" i 6 3 8 



ILKLREGGSHISL 476 
CCTGCTCAAGCTGCGGG AACCCGGCAGCCACATCAGCCTG 1716 



c&t^tgBctttcagcaVaa^ J9I-4 

C&TCCTGCAAicCCCAicATCA^CCCCC^ J|? 2 

cgca§caIca§tcSagSaagtg^ |§io 

aag<£acgaa?gtggggcc^ jgj 8 

ga^Saa&aLcHgtggtS atS AA?TCCCCCCA&^ 5 J g 6 

GCCAicC*CA£?GfcAT\^ §Jg 4 

C&T&TXccfeT?AG^^^ f5g 2 

cScc?C<^5cTC^AC^^ ffj 

ctgg§cctgg8cg^cgScctgcccgagc$gc§cg^ggccgScGvTGagg^ 2 1 }g 

^&<&G&C&C§CT&cSg^ ^ 

g8cctggSccWgcc j ct!ggc\:a2cctgc^ jg^ 

cSctItgacag^g&ga^c^ 2g§ 2 

cccaScaScaaccccctgg^ f^ 0 

gtcacggctctatScg§tgtga8ccccgSgc&cggccga?gcc^ U% 

cagcccgo;c§ccaVgScccVg8ccc^^ 5§g_ 

<&cScc&:cgcccccctg^ 3|2 4 

c&c?ccc*t*gg^ ?J§ 2 

vplcgakgsh* 

^S^-^ccc^ 3198 

CCAGGCCCTs,s.CwCAGCCCGCCATGCCACAAGGTCGCGGAGGCCCTCCGCAGCATGTTCACTCTATTTA^'^SGGGAAC 1 ? 16 

GACCC^GCGCCCCACACTTAACTTATTCCTT^ 3354 ' 

£c£ic^^ "32 

3454 

SSsi* o7'«ch a |it d Tt Ce 5 u' m ; n0 * Cid SCflUtn " S ° f the hun ' an MLK2 cDNA - Nucl "' idc ™> acid numbers a rc indicated a. the 

bo«, ^i3SS!5S i Sc do 7s iSe^ ft,** dCnnC thC ^ UuClne heptads " d < h < b ^ « — < in wLc 
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££LK2 

ML1C3 

GRB-2C 

PfcC 

C-9RC 

N-SRC 

SPC 

PI-3K 



» • < INSEHT- 

gpvwt£vpd32ju.ode5&lrro3rvqvlsqd CAVSG 



NPVWT 



A1FD5BPSGQI 



DPQEDGEI GPPROttlHVMDITS 



TFI RSHI I QMVEKQ 



ORKODRVBVLSRD AAISG 



DEGjwirGO. lpsorvgvf|?3NTvWp 

D2GfVO?AG0 VG GQVGIPP0NYV3R 

DPKWWKGA CH GQTGMPjPRNYvjrP 

EQGWWRGD YGOKKQ WFPSWYVjEE 

E GE WWLAH SLTTGQTGY J PSKYYAP 

EGEWWLAH SLS TGQTGYI P^KYt|aP 

KKDWW3CVE VNDRQ OFVPXAYVKK 
GYQYRj\LYDYiKKERgEgyLHLj0Dil LTVHKGS LVAL G FSD CQEARP E EljcNLNG YNETTGER QDp jp CTYvf E Y 



VTTFALYrriESRTETDZ.SFKX|3ERLQIVNNT 
VTTffVXLYI)YESRTZT DX. SPKXGZRLQXVNNT RKVDVR 
KELVljALYp^ QEKSPRBVraKKgDILTLLNST 



Fig. 2. Aliment or SH3 domain amino add sequences. The predicted amino acid sequence of the SH3 domain of MLK2 ali«n-d with the 
sequence, of MLK3 lh c C- term.nal SH3 domain of GRB 2 (GRP. 2C). phospholipase Cy (PLC). c,v Wi „-,*. spectrin (SPQ and P 85 P Td J, K 
!n^tS n0 " th ° l m3ke UP ^ SH3 d ° m3iR C ° m m ° tifS arc ^ ™* * c Position, of the conserved S ™£ Ste 



The 1 034-bp MLK2 cDNA fragment was then used as a 
probe to screen a human brain cDNA library and a single clone 
was isolated and its insert sequenced. The 3454-bp nucleotide 
sequence of this insert is shown in Fig. 1. The insert contains 
289 bp of 5'-untranslated nucleotides, an open reading frame of 
2862 bp. and 304 bp of 3'-untranslated nucleotides. The putative 
initiator AUG codon begins at nucleotide 290 and is preceded 
by an in-frame stop codon beginning at nucleotide 122. This 
methionine codon is contained within the sequence CCCAUQG 
(positions -3 to +4). According to the scanning model of Ko- 
*ak (1989} : translation i £ initiated id ihe most 5' AUG 

in a favourable context. The most favourable sequence for rec- 
ognition by eukaryotic ribosomes is A/GCCAlIQG (positions - 
3 to +4). Muuigenic analysis of this sequence, however, has 
shown that in the absence of the purine at the -3 position, there 
is a strong preference for recognition of sequences with a G at 
position +4 (Kozak, 1986). The sequence surrounding the AUG 
beginning at nucleotide 290 fulfils this requirement. There is one 
upstream AUG, beginning at nucleotide 165, in a more favoura- 
ble context for initiation. This AUG. however, is in a different 
frame from that of the main open reading frame of the sequence 
and is followed by an in-frame stop codon beginning at nucleo- 
side 255. In such a situation, the main start site of the protein is 
predicted to be the downstream AUG (Kozak, 1986). Thus the 
AUG at nucleotides 290-292 most likely encodes the actual N- 
icrminus of the MLK2 protein. At the extreme 3' end of the 
MLK2 cDNA, a stretch of 13 adenine nucleotides is preceded 
by a typical polyudcnylation signal (AATAAA) beginning 29 
bases upstream from the po!y(A) tract. This suggests that the 3' 
end of the MLK2 mRNA is included in this insert. 
\„ i^ TcheSl of nuc lcotide sequence databases with the complete 
MLK2 cDNA revealed the presence of four EST fragments with 
varying degrees of identity to MLK2. Only one of these frag- 
ments, from an infant brain cDNA library (GeneBank T15757) 
represents the 3' terminus of the MLK2 message. TTic other three 
GeneBank T9S616. H01340.. and H01390) all represent internal 
fragments of MLK2 cDNA. Both H01340 and HOI 390 arc from 
a placental cDNA library and represent pans of the MLK2 SH3 
and kinase domain sequence. T98616. however, is from a human 
roetai liver-spleen library and contains both a region of stronc 
Molarity to MLK2 and a region that is not similar to MLK2 at 
, Th J s EST ™> represent an artefact in the library or an al- 
-rnately spliced product of the MLK2 gene. 

tS?v? ^ sequcnce - The i0 ^est open reading frame in the 
cDNA encodes a putative protein of 954 amino acids 
a calculated molecular mass of 103 506 Da. Based on a 



hydrophobicity analysis (Kyte and Doolittle, 1982) of the pre- 
dieted amino acid sequence, the MLK2 protein contains no obvi- 
ous signal sequence or membrane-spanning region (data not 
shown). Comparison of the sequence with conserved motifs for 
defined structural domains, however, reveals that MLK2 con- 
tains a SH3 domain, a kinase catalytic domain, and the unique 
double leucine zipper and basic domain of the MLK family 
(Dorow et aL. 1993). The extreme N-tcrminal sequence of the 
MLK2 protein is acidic with four glutamic acid residues imme- 
diately following the putarive initiator me»hk\nirte (Fig. 1). This 
22-amino-acid N-terminal sequence is unique with no significant 
similarity to any sequence in the protein data bases. The N- 
terminal region is followed by a 55-amino-acid sequence (resi- 
dues 23-76) containing the highly conserved consensus motifs 
for SH3 domains (Musacchio et aL. 1992). Alignment of this 
sequence with those of $H3 domains from the C-terminal of 
GRB2, phospholipase Cy (PLC), spectrin (SPC), cellular sre (c- 
$rc). neuronal sre (n-5/r). and the p85 subunit of phosphatidyl 
nositoi 3'-kinase (p85-Ptd!n$3K) is shown in Fig. 2. The align- 
ment shows that the MLK2-SH3 domain has a 5-amino-acid In- 
sert (residues 49-53) in a region corresponding to inserts o!' 6 
and 15 amino acids in the SH3 domains of n-src (Martine2 ct 
aL, 1987) and p85-PtdIns3K (Skolnick ec aL, 1991). respectively. 
These inserts are located in a region of the sequence that has 
been postulated to influence selectivity of $H3 domain binding 
(Booker et al., 1993; Koyama et aL 1993). 

The MLK2 kinase catalytic domain, residues 101-359, con- 
tains all of the conserved amino acids forming the 11 sub-do- 
main motifs used by Hanks et at. (1988) to define protein kinase 
lamiltes. Overall, the kinase domain amino acid sequence is 
more similar to that of the tyrosine than the serine/thrconine- 
specific kinases. Furthermore, there are several motifs in the C- 
terminal ponton of the MLK2 catalytic domain sequence that 
are found only in the tyrosine kinases and the /a/onco°cne fam- 
ily (Hanks et aL, 1988). In particular, four motifs "involving 
either tryptophan or proline residues are highly conserved in the 
receptor tyrosine kinases and the MLK family. In the MLK2 
sequence these correspond to Trp294-Glu295, Cys339-Trp^40 
Pro30l-Tyr302. and Cys328-Glu330. The MLK2 sequence* 
however, also contains a lysine residue (position 224) in subdo- 
main Vlb (Hanks et aL, 1988) that is conserved in the serine/ 
threonine, rather than the tyrosine, kinases. Thus, the MLK? ca- 
talytic domain sequence falls into a category closely related to 
the tyrosine kinases but with predicted specificity for serine and/ 
or threonine. 

Beginning at residue 384 in the MLK2 amino acid sequence, 
there is an SO-amino-acid region that contains two leucine/iso- 
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MLK2- 
MLK3- * 

MLK2-AGPVWTA 1 



MEEEEGAVAKEWGTTP 16 
MEPLKSLFLKSPLGSV^SGSGGGGGGGGGRPEGSPKAAGY 41 



,GDJSELTLRRGDRVQVLSQI^y^^(^^GQLPSGRVGVF5 



t^PGAPAAPAGLQlPQ 91 



yLKS-ANPVWTALPDYEPSGODEIALRKGDRVEVLSRDAAISGDEGWWAGQVG GQVGIFPSNYVSRGGGPPPC EVA 112 

X II III IV 

MLR2 - EI PPHELQLEEI IGVGGPGKVTRALVmGEEVAVXAARLDPSKDPAVTAEQVCQEAIU*PGALQHPNIIALRG ACLN 166 

KLK3- SPQELRLEEVIGIGGFGKVT^GSWRGELVAVTUARQDPDEDISV^ 185 

V Via VIb 

MLK2-PPHLCLVMEYAJ*GGALSRVIAGRRVPPHVLVHWAVQ^ 241 

KLK3-EPN^CLVMEYAAGGPLSRAIJVGRRVTPHVLVNV7AVQIARGMH^ 260 

VII VIII IX 

KLK2 -TV^KITDFGliAREWHKTTKMSAAGTYAWMAPEVT^^ 316 

MLK3-KTLKITDFGIARITrfHKTTQMSAA 335 

X XI _ ^ & a 

MLK2-M>TKLTLPIPSTCPEPFARLLEECWDPDPHGRPDFGSIL 

: .:: ::: . :: :: : : | : 

MLK3-W3aTLPIPSTCPEPPAQLMAJDCWAQDPHRRPDFASILQ<^ 

3| j& SPACER 

MLK2-RTKSKE LRSREEE[IfiRAAQEQRFQEEC 
::[:: 
riT>.oo 



391 
410 




^^V°J:C£3C;^I>^J^ R 484 
C- TERMINAL DOMAIN-* 

lttK2-LREG6SHlSLPSGPEHKll?VQASPTIJM^ 541 

mlk3-aju>ggerismpldpkhrj:tvqaspgldrrrnvfevgpgdsptf prfraiqlepaepgga WGRO 547 



MXK2-GPPKKEELVGGKKXG RTWGPSSTLQKERVGGEERLKGLGEGSKQWSSSAPN LGKSPICHTPIAPGFASLNEMEE 614 
HLK3 -SPRRLEDSSNGERRTCWAWGPSSPKPGEAQNGRRRSR MDEATWLDSDDSSPLG SPS TPPALNGNPPRPSLE 619 



MLK2-FAMDGGSSyi^|gYSTPSYLSVPL PAEPSPGARA PWEPTPSAPPAR WGHGARRRCDLALLGCATLLGAVG 638 
MLK3-PEEPKRPVPAERGSSSGTPKLIQRALLRGTALIJ^LGLGRDLQPPGGPGRERGESPM 694 



MLK2-LGADVAEARAADGEEQRRWLDGLFFPRAGRFPRGLSPPARPHGRREDVGPGLGL APSATLVSLSS VSDCNSTR 759 
MLK3-LICFSLKTPDSPPTPAPLLLD LGIPVGQRSAKS PRICES ^G^ 768 



MLK2- SLLWDSDEAAPA*!!^^ 832 
KfcK3-PskRSRIDPWSFVi^ DSDP^TSPPAKPFQGGPQDCRA^MGAQAPWV 842 



KLK2-ALG$RG5^£AG^ $QS 
KLK3-PEAGP*847 

|||| 

MLK2-LTISPPSRPDTPE^pi5VQPTLLDMDMEGQNQDSTVPLCGAH^ 

Fig.3. Alignment of the predicted amino acid sequences of MLK2 and 3. Predicted amino acid sequences of MLK2 and 3 were aliened usine 
Cie program CLUSTAL (H.-ins and Sharp. 198$) and by eye. Protein domains are identified above the line. The SH3 domain is delineated by 
Brackets and three SH:> consensus motifs containing aromatic residues arc numbered within the brackets. The kinase domain is marked at the 
beginning and end by arrows and the subdomain motifs arc numbered with roman numerals. Leucine dipper hcHdd residue* are marked (</>) and 
tnei banning of the C-ienninal domain is defined. The basic domain is marked al the beginning and end as is the glvcinc/scrine-rich peptide 
<♦-*). Sequences similar to the core motif for SH3 domain recognition arc marked with the letters E and where P represent, a conserved proline 
and X any amino acid (shaded). 
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leucine heptad repeats and a basic motif. Each of the heptad 
motifs fits all of the criteria set out by Landschulu ct al. (1988) 
lor the classic leucine zippers of the transcription factors. Thus, 
they each contain 22 amino acids with a leucine or isoleucine 
residue at every seventh position, a higher than average content 
of charged amino acids (>50%). and an absence of the helix- 
breaking residues proline and glycine. The two zipper motifs are 
separated by a 13-amino-acid spacer sequence. A stretch of basic 
amino acids begins 9 residues after the last heptad leucine of the 
second zipper motif. Of 15 residues in this region, nine are basic 
(Lys or Arg) with no acidic residues. Furthermore, within this 
basic sequence is a motif (VRKRKG) that is very similar to 
nuclear localisation signals reported for several proteins, includ- 
ing the simian virus 40 large T antigen (reviewed by Kaldcron 
ctal., 1984). 

Following the basic region, there is a C-terminal domain of 
491 amino acids that is rich in serine/threonine (17%) and pro- 
line (16%). The high proline content is most striking in a stretch 
of 218 amino acids near the C-tcrminus (residues 712-929). 
where 22% of the residues are proline. One particular 20-amino- 
acid sequence (774-793) contains 11 proline, 4 serine, and 3 
(hrconine residues. Furthermore, there are several poly(proline) 
motifs (Fig. 3) within the C-tcrminal domain that are similar to 
consensus sequences identified as binding sites for SH3 domains 
from a number of proteins (Ren et al„ 1993; Yu et a!., 1994). 
The amino acid composition of this domain is also biased toward 
basic, rather than acidic, residues wi<h a calculated pi of 9.38. 

Alignment of the predicted amino acid sequences of MLK2 
and MLK3 (Fig. 3) shows a high degree of identity within their 
luutuc catalytic, ana leucine zipper/basic domains. This is 
most obvious for the kinase catalytic domains that share S3% 
amino acid sequence identity. If conservative substitutions axe 
considered, the catalytic domain similarity between MLK2 and 
3 is 90%. While the sequences of the SH3 domains have sightly 
reduccoWdentity (70%), they share a very high degree of conser- 
vation of amino acids. Only 3 of 16 substitutions within the 
55-amino-acid domain are non-con$crvacive, corresponding to a 
similarity of 95%. There is. however, a single amino acid inser- 
tion in this domain in MJLK2 (Ser65) compared to MLK3. There 
are several insertions/deletions between the two sequences in 
the non-conserved region joining the SH3 domain to the kinase 
catalytic domain (MLK2, residues 86-93). The insertion at po- 
sition 65, however, is the only one that affects alignment of the 
MLK2 and 3 Sequences within one of the known structural do- 
mains. The double leucine 2ipper and basic domain is also well 
conserved, with 65% identity/75% similarity between the two 
sequences. 

Outside of the SH3. kinase and leucine zipper/basic domains 
the similarity between MLK2 and MLK3 decreases dramatically 
In the reg(on closest to their N-tcrmini. for instance, there is no 
similarity between the sequences. This N-tcrminal segment is 22 
residues long in MLK2 and consists of 47 residues in MLK3 
The four glutamic acid residues at the N-terminus of MLK? are 
not present in the MLK3 sequence. The N-terminal region of 
"7 howe *er. contains a 13-residue sequence with 11 glycine 
and 2 senne residues, including a stretch of nine consecutive 
glycine residues. It is interesting that this MLK3 polyMycinc) 
sequence is located at about the same distance from the be^in- 
MLK2 ^ p0ly(glutamic acid > ^quence in 

The large C-terminal domains of the two proteins, while born 
rich m serine/threonine and proline, are also poorly conserved 
Abide from a few short sequences near the end of the basic 
domain, there is little actual identity between the two sequence,, 
t-urthermore. the size of the C-terminal domains in the 
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Fi£4. Northern-blot analysis of human tissue mRNAs. Auroradio- 
graph of an RNA blot hybridised with a MLK2 cDNA probe. Each bnc 
contains 2 g mRNA from (1) heart. (2) brain, (3) placenta. (4) lun* (5) 
liver. (6) skeletal muscle. (7) kidney, and (S) pancreas. The MLK2 band 
is indicated (— ). The blot wjn washed ac a stringency of 0.5 x NaCl/Cit 



reins differs quite considerably, with 49] 
compared to 360 in MLK3. 



amino acids in MLK2 



Northern-blot analysis. Expression of MLK2 RNA was exam- 
ined by Northern-blot analysis of mRNA from human heart 
brain, placenta, lung, liver, skeletal muscle, kidney, and pan- 
crea*. The probe used for this analysis contained nucleotides 
85-700 (fig. 1) including 200 nucleotides of the 5'-umranslated 
MLK2 cDNA, as well as the sequence encoding the SH3 domain 
and the N-terminal region of the catalytic domain. In this analy- 
sis, a band at about 3.8 kb was detected at highest levels in RNA 
from brain and skeletal muscle, with a lower level in pancreas 
(Fig. 4). Expression in the other tissues was extremely low or 
undetectable. Given the high degree of sequence identity be- 
tween MLK1 -3. some cross-hybridisation in this type of assay 
might be expected. MLK3, however, has a much wider pattern 
of expression in human tissues, with high levels in lung. Jiver 
and kidney as well as brain and skeletal muscle (In* et al° 1994< 
Ezoc et al.. 1994; Gallo et al.. 1994). As hybridisation with the 
MLK2 probe did not reveal a significant level of expression 
in lung, liver or kidney, cross- hybridisation with MLK3 is not 
indicated. A MLKl cDNA probe, however, hybridises to a band 
of an entirely different size compared to MLK2 (Dorow D 
unpublished results) ruling out.possible cross-hybridisation with 
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Chromosomal localisation. To determine the chromosomal lo- 
cation or the human MLK2 gene, metaphase chromosomes from 
two normal male donors were hybridised by the FISH technique 
using the ^454-bp MLK2 cDNA as a probe. 25 metaphases from 
one donor were examined, and a fluorescent si S nal was detected 
on one or both chromatids of all chromosomes 19. Of this signal 
7j% was located at ql3.2. with the remainder in the region of 
qtj.l to q!3.3. There were 23 non-specific background signals 
recorded in the 25 metaphases. A similar result was obtained 
with the chromosome preparation from the second donor con- 
firming the localisation of the MLK2 gene to chromosome 19 
ql j.2. 
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DISCUSSION 

We have reported the cDNA sequence, expression, and chro- 
mosomal location of MLK2, a member of the mixed-lineage 
family of protein kinases. To date, four members of (hi* family 
have been described and complete cDNA sequences have now 
been reported for three. MLK2 (this study), MLK3 (Ing cc al., 
1994), and DLK (Hoizman et al., 1994). MLK2 and 3 arc each 
comprised of a SH3 domain, a kinase catalytic domain that is 
related to both the serine/threonine and the tyrosine types, a 
double leucine zipper, and a basic domain. In addition, they each 
have a large C-terminal domain rich in serine/threonine and pro- 
line. Our recent sequence data show that MLKl also has a SH5 
domain in its N-terminal region (Dorow, D„ unpublished re- 
sults). The fourth MLK family member, DLK (Hoizman et al., 
1994), lacks a SH3 domain, but otherwise shares significant 
structural similarity to MLKl -3. 

SH3 domain. Among SH3 domains of known mammalian pro- 
teins, the SH3 domain sequences of the MLK proteins are most 
similar to that of the C-terminal SH3 domain of GRB2, an adap- 
tor protein comprised of a SH2 domain flanked by SH3 domains 
(Lowcnstcin et al., 1992). GRB2 binds activated EGF receptors 
and platelet-dcrived-growth-factor receptors and connects ihem 
to the ras signalling pathway. The insert region, common to the 
SH3 domains of the MLK proteins. p85-PtdJns3K and n-sn, 
however, is not found in the GRB2-SH3 domain sequence. 

Three-dimensional structures of SH3 domains from several 
proteins, including a-spectrin (Musacchio et al., 19921 c-Fvn 
(Nobie ec a]., i9tt5), Lck (Eck ec al., 1994), c-src (Yu et al., 
1992), p35-PtdIn$3K (Booker et al.. 1993; Koyama et al., 1993), 
and phospholipase Cy (Khoda ct al., 1993) have been published." 
While SH3 domains from the various proteins have a low degree 
of amino acid identity, all show a similar folding pattern. There 
ate a series of motifs with several conserved aromatic residues, 
corresponding to MLK2 residues 24-26 (Phe-Asp-Tyr). 57 and 
58 (Trp-Trp). and 71-75 (Pro-Ser-Asn-Tyr-Val), that contribute 
to the SH3 domain consensus (Koch et al., 1991). In the folded 
structures, the conserved residues are located in /?-shects con- 
nected by variable loops and the aromatic side chains line a 
binding pocket on the surface of the protein (Koyama et al.. 
1993; Booker et al., 1993; Yu et al, 1994). Pan of one variable 
loop forms an end of the binding pocket, leading to speculation 
chat residues within this loop may contribute to fine specificity 
of the domain. The inserts in the SH3 domains of MLK2, 
MLK3, p85-PtdIns3K, and n-src are all located within this loop! 
The placement of the inserts at one end of the binding pocket 
suggests that they may play a role in target recognition by these 
SH3 domains. This is further supported by the differential pep- 
tide recognition of the SH3 domains of c-src and n-src (Cichctti 
et al.. 1992; Booker et al., 1993). that are almost identical except 
for the presence of the n-jrrc insert. The onlv other difference 
between rhe two being a Ser Thr substitution in another vari- 
able loop (Fig. 2). While the 5-residuc insert in the MLK family 
contains only hydrophobic and polar residues, four of the six 

^ l r dU t S ^ i * in thC n ' 5rC inSc:rX are char S e <*. In the longer p85- 
WdlnsjK insert, the first six residues are hydrophobic with sev- 
eral charged residues in the remaining nine. These differences 
suggest that the insert sequence may be an important determi- 
nant of specificity of the MLK-SH3 domain. 

A second variable area, corresponding to residues 63-67 of 
MLK2, is located between the last two consensus motifs of the 
5Hj domain sequence. Within this region there are three re- 
placements and one insertion in MLK2. compared to MLK3 
The sequence of this loop is Lcu-Pro-Scr-Cly-Ar* in MLK 7 
compared to Val-GIy-Cly-Cln in MLK3. Sequence variation 



within this region, therefore, may also effect recognition speci. 
ficity of MLK-SH3 domains. 

Catalytic domain. As has been suggested for other kinase fam- 
ily members (Hanks et al.. 1988). the high degree of catalytic 
domain identity between the MLK proteins implies that they, 
may also have similarity in their cellular roles. The kinase do- 
main sequences of" MLK1-3 are highly conserved and closely- 
related to members of the tyrosine-specific kinase families' 
(Hanks, l99l). The kinase domain of DLK, while conserving 
the genera! features of the MLK enzymes, shares only 36% 
amino acid identity to MLKl (Hoizman et al., 1994) and several 
gaps must be introduced to align the sequences. DLK, however, 
has very strong similarity to a 1 OS-residue fragment of a putative 
human serine/threonine kinase in the PIR data base (accession 
number $37420; Schultz and Nigg, 1993). Within the region of 
overlap, these two sequences share 87 identity and 92% simi- 
larity (Hoizman et al.. 1994). As has been suggested by Ho.lz- ' 
man et al. (1994), DLK and the protein represented by PIR ac- 
cession number S37420, appear to form a second more distantly 
related subgroup within the MLK family. Both DLK and MLK3 
(SPRK) autophosphorylate on serine and threonine in immune 
complex kinase assays (Hoizman et al., 1994; Gailo et aL, 
1994). To date, no activity of any of the MLK proteins toward 
other substrates has been demonstrated. It is therefore not pos- 
sible to rule out specificity for tyrosine phosphorylation of some 
target protein. It is most probable, however, that the MLK en- 
zymes will all display serine/threonine activity. In this respect, 
SPRK is the only kinase with demonstrated serine/threonine 
specificity to also have a SH3 domain (Gallo et al. } 1994). 

The double leucine zipper domain. The double leucine zipper 
domain is the most striking structural feature of the MLK se- 
quences. The overall structure of this domain is highly con- 
served in all members of the MLK family. Furthermore, the dis- 
tance between the kinase domain and the first leucine of zipper 
motif no. I is almost identical between MLK ! -3 and DLK. In 
DLK, however, there is an 18-amino-acid insert in the spacer 
region between the two zipper sequences. Thus, the zipper mo- 
tifs are 13 residues apart in MLKl -3 and 31 in DLK. As leu- 
cine zippers are commonly found in proteins that form dimers 
or higher order oligomers, the Uppers of MLK proteins may 
participate in formation of complexes with themselves, other 
members of the family, or other proteins. Chou and Fasman 
(1978) analysis of the double zipper region of MLKl -3 has 
predicted a hclix-turn-hclix conformation for this domain 
(Dorow ct al., 1993; Gallo et al., 1994). As discussed previously - 
(Dorow et al., 1993), such a conformation would allow for in- 
teraction between the two zippers of one MLK molecule to form 
a uniquely folded zipper-tum-zipper domain. 

The conserved basic sequence following the second zipper " 
motif in MLKl -3 is placed at about the same distance from the 
leucine zipper motifs as the basic DMA-binding sequence of the 
transcription factors. In the transcription factors, however, the 
DNA-binding peptide is on the N-terminal side of the leucine 
zipper. The basic sequence is not conserved in the DLK se- 
quence, although DLK contains one similarly placed stretch of 
eight ammo acids of which four are basic (Hoizman et al i 994) 
The basic domains of MLKl -3 all contain a motif that'is con- 
sistent with nuclear localisation signals that function in several 
proteins (Kalderon et al.. 1984). It has not yet been determined, 
however, if the basic sequence functions as a nuclear tar<>e[in rt 
signal in any of the MLK proteins. * " 

C-terminal serine/threonine and proJine-rich domain. While 
the C-tcrminal domain amino acid sequences of MLK2 and 3 
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each comprises u similar high serine/threonine and proline 
i-ontent. the actual sequences have little identity. There are n few 
xjiort .segments thai shurcsequenec similarity, but these ccmsii- 
iuie only a minute proportion of the total sequences of the C- 
icrminal domains. OLK contains a 332-amino-acid C-terminal 
domain that is also rich in serine and proline. The C-tcrminal 
domain wis not coded by the human colonic cONA clone from 
which the MLK2 partial amino acid sequence was previously 
predicted (Dorow ct al., 1993). In the cDNA sequence of the 
original clone, there was a one-base insertion causing a shift to 
a reading frame with a premature stop codon. This appears to 
have been an artefact in that particular clone, as another clone 
subsequently isolated from a human colonic cDNA library 
matches that reported here from the human brain. 

Due to the high content of hydroxy amino acids, the C-termi- 
nal domain of MLK2 may be a target for regulatory phosphory- 
lation events. There is one 13-rcstdue segment (residues 524- 
537) comprised of six glycine, seven serine, and one threonine 
residue, that is not conserved in the MLK3 C-terminal sequence. 
This segment, however, does have some similarity to the glycine 
rich sequence in the N-terminal region of MLK3. Within the 
MLK2 C-terminal domain there are also several proline-rich mo- 
tifs that arc similar, but not identical, to sequences identified as 
binding peptides for SH3 domains from several proteins (Ren et 
al., 1993; Yu et al., 1994). Recently, two models have been put 
forward to explain the mechanism of SH3 domain recognition 
of poly(proline) peptides (Feng et al., 1994; Lim and Richards, 
1994), Both of these models suggest that the minimal sequence 
required for SH3 domain recognition is Pro-Xaa-Xaa-Pro, In an 
extensive tv^cy, . <u. um:u oocn sequence compari- 

sons and mutagenic analysis to propose a general scheme for 
recognition. In this analysis two classes of binding peptides were 
identified. The peptides conformed to either Xaa-p-Xaa-P-p-Xaa 
(class I) and Xaa-P-p-Xaa-P-p-Xaa (class (0 consensus se- 
qucnccs,(where Xaa is any amino acid, p is a scaffolding residue 
that is often proline and P is a critical proline residue). Within 
the C-terminal domain of MLK2 there are four P-P-Xaa-P and 
three P-Xaa-P-P sequences. There is also one particular proline- 
rich motif with the sequence PSPPPSPPAPTPTP that contains 
both the class ! and class II consensus sequences. These prolme- 
rich sequences present the possibility for an intramolecular in- 
teraction between the MLK2-SH3 domain and its own C-termi- 
ual domain. This same possibility has been noted for MLrO (In^ 
etal., 1994). s 

Tissue distribution. By Nonhern-blot analysis of mRNAs from 
several human tissues. MLK2 expression was highest in brain 
and skeletal muscle, with a reduced signal in pancreas. iMLK2 
is also expressed at low levels in epithelial cell lines of the breast 
und colon, but was not detected in hacmopoietic cell lines 
(Oorow. D.. unpublished results). MLK3, however, has a wide 
expression pattern with moderate expression in most cell lines 
and tissues examined (Ing et al.. 1994; Gallo ct al., 1994; Ezoe 
ct al.. 1994). Expression of MLK3 is high in placenta, lung, and 
liver whereas the expression of MLK2 is barely detectable. 
MLK3 expression has also been found to be high in melanocytes 
and in cells of haematopoietic origin (Ing et al, 1994; Ezoe et 
al.. 1994). The expression pattern of murine DLK is more re- 
stricted, with significant expression being detected only in foetal 
and adult brain (Holzman et al., 1994). 

Chromosomal localisation. The gene encoding MLK2 has been 
mapped to human chromosome 19 q!3.I-l3.3. The distribution 
of signal in this analysis suggests that the gene resides in the 
Q.13.2 region. A search of the Genome Data Base (William H 
Welch Medical Library. Johns Hopkins University, Baltimore 



USA) revealed that other genes mapped to human 19 q13.2 in- 
clude apolipopnueins C-I. C-If, and E. several carcinoemhryonic 
antigen and pregnancy-specific /M glycoprotein family mem- 
bers, and transforming growth factor The gene for MLK-1 
has been localised to human chromosome 14 q24.3~3 1 (Dorow 
et al.. 1993) while that of MLK3 was mapped to 1 1 q1 3.1 - 13,3 
(Ing et al.. 1994). 

Thus far. the role of MLK proteins in cellular networks has 
not been elucidated. The presence within their sequences of-tfo- 
mains associated with regulation of signal transduction, phos- 
phorylation and gene transcription, however, suggests that they 
may play important roles in the control of cellular processes. 
Identification of proteins with which the different MLK2 do- 
mains interact may provide information on new pathways for 
cellular regulation. 
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A Novel Gene Containing a Trinucleotide Repeat 

That Is Expanded and Unstable 

on Huntington's Disease Chromosomes 



The Huntington's Disease Collaborative 
Research Group* 

Summary 

The Huntington's disease (HD) gene has been mapped 
In 4p16.3 but has eluded Identification. We have used 
haplotype analysis of linkage disequilibrium to spot- 
light a small segment of 4p!6.3 as the likely location 
of the defect. A new gene, IT1 5, Isolated using cloned 
trapped exons from the target area contains a poly- 
morphic trinucleotide repeat that is expanded and 
unstable on HD chromosomes. A (CAG)„ repeat longer 
than the normal range was observed on HD chromo- 
somes from all 75 disease families examined, com- 
prising a variety of ethnic backgrounds and 4p16.3 
haplotypes. The (CAG)„ repeat appears to bo located 
within the coding sequence of a predicted -348 kd 
protein that Is widely expressed but unrelated to any 
Ln 0 ujr> gene. Thus, the HO mutation Ir.vclves zr. 
unstable DNA segment, similar to those described in 
fragile X syndrome, spino-hulbar muscular atrophy, 
and myotonic dystrophy, acting in the context ol a 
novel 4p1 6.3 gene to produce a dominant phenotype. 
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Introduction 

Huntington's disease (HD) is a progressive neurodegener- 
ative disorder characterized by motor disturbance, cogni- 
tive loss, and psychiatric manifestations (Martin and Gu- 
sella, 1986). It is inherited in an autosomal dominant 
fashion and affects -1 in 10.000 individuals in most popu- 
lations of European origin (Harper et al. ( 1991). The hail- 
mark of HD is a distinctive choreic movement disorder 
that typically has a subtle, insidious onset in the fourth to 
fifth decade of life and gradually worsens over a course 
of 10 to 20 years until death. Occasionally. HD is ex- 
pressed in juveniles, typically manifesting with more se- 
vere symptoms including rigidity and a more rapid course. 
Juvenile onset of HD is associated with a preponderance 
of paternal transmission of the disease allele. The neuro- 
pathology of HO also displays a distinctive pattern, with 
selective loss of neurons that is most severe in the caudate 
and putamen. The biochemical basis for neuronal death 
in HD has not yet been explained, and there is conse- 
,-1. i^pMy n£ ;r92!msr;t effective in delaying cr prevent;?^ 
the onset and progression of this devastating disorder. 

The genetic defect causing HD was assigned to chromo- 
some 4 in 1 983 in one of the first successful linkage analy- 
ses using polymorphic DNA markers in humans (Gusetla 
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Figure 1. Long-Range Restriction Map of the 
HO Candidate Region 

A partial long-range restriction map of 4plG.3 
is shown (adapted Ircm Un et al. [1991 D* The 
HD candidate region determined by recombi- 
nation events is depicted by hatched bars be- 
tween 04$ JO and D4$38. The portion of the 
HO candidate region implicated as the site of 
ins defect by linkage disequilibrium haplorype 
analysis (MacOonald et al., 1992) Is shown as 
a closed bar. Bolow the schematic map, the 
region from 04$18Q io£WSJfl2 Is expanded to 
show the cosmid contig (averaging 40 ko per 
cosmid). The genomic coverage and. where 
known, the transcriptional orientation (arrows, 
5' to 3) of the IT1S, ITi 1 , IT10C3, and ADDA 
gones is also shown, locus names above the 
map denote selected polymorphic markers that 
have been used in HD families. The posilions 
of D4S127 and D^S95, which form the core ol 
haptotype In the region of maximum disequillth 
hum. are also shown in tno cosmid contig. Re- 
striction sites are given for Norl (N). Mlul (M). 



and Nrul (R). Sites displaying complete digestion are shown in boldfaco. while sites subject to frequent incomplete digestion are shown as lighter 
symbols. Brackets around the N symbols Indicate the presence of additional clustered Notl sires. 



et al., 1983). Since that time, we have pursued a location 
cloning approach to isolating and characterizing the HD 
gene based on progressively refining its localization (Gu- 
sella, 1989, 1991). Among other work, this has involved 
the generation of new genetic markers in the region by a 
number of techniques (Pohl et al.. 1988; Whaley et at., 
1991; MacDonatd et al., 1989a), the establishment of ge- 
netic (MacOonald et al., 1989b; Allitto et al., 1991) and 
physical maps of the implicated regions (Bucan et al., 
1990; Bates et al., 1991; Doucette-Stamm et al., 1991; 
Altherr et al., 1992), the cloning of the 4p telomere of an 
HD chromosome in a yeast artificial chromosome done 
(Bates et at., 1990; Youngman et al., 1992), the establish- 
ment of yeast artificial chromosome (Bates et al., 1992) 
and cosmid (S. 8. et al., unpublished data) contigs of the 
candidate region , as well as the analysis and characteriza- 
tion of a number of candidate genes from the region 
(Thompson et al., 1991 ; Taylor et al.. 1 992; Ambrose et al., 
1992; M. P. D. et al., submitted). Analysis of recombination 
events in HD kindreds has identified a candidate region 
of 2.2 Mb. between D4S10 and D4S9S in 4p16.3. as the 
most likely position of the HD gene (MacOonald et al., 
1 989b; Bates et al., 1 991 ; Snell et al., 1 992). Investigations 
of linkage disequilibrium between HD and DNA markers 
in 4p16.3 (Snell et aJ., 1989; Theilman et al., 1989) have 
suggested that multiple mutations have occurred to cause 
the disorder (MacDonatd eta!., 1991). However, haplotype 
analysis using multiallele markers has indicated that at 
least one-third of HD chromosomes are ancestrally related 
(MacDonald et al., 1992). The haplotype shared by these 
HD chromosomes indicates that a 500 kb segment be- 
tween D4S180 and D4S1B2 is the most likely site of the 
genetic defect. 
Targeting this 500 kb region for saturation with gone 



transcripts, we have used exon amplification as a rapid 
method for obtaining candidate coding sequences (Buck- 
ler et al., 1991). This strategy has previously identified 
threo genes: the a-adducin gene (ADDA) (Taylor et at., 
1992) and a putative novel transporter gene (IT10C3) in 
the distal portion of this segment (M. P. D. et al., submit- 
ted), and a novel G protein-coupled receptor kinase gene 
(IT1 1) In the central portion (Ambrose et al., 1992). How- 
ever, no defects implicating any of these genes as the HD 
locus have been found. We have now applied the exon 
amplification approach to the proximal portion of the 500 
kb segment. We have Identified a large gene, IT15, span- 
ning - 21 0 kb, that encodes a previously undescribed pro- 
tein of -348 kd. The 1T15 reading frame contains a poly- 
morphic (CAG)„ trinucleotide repeat with at least 17 alleles 
in the normal population, varying from 11 to 34 CAG cop- 
ies. On HD chromosomes, the length of the trinucleotide 
repeat is substantially increased, to a range of 42 to over 
66 copies, and shows an apparent correlation with age of 
onset, the longest segments being detected in juvenile 
HD cases. The instability in the length of the repeat is 
reminiscent of similar trinucleotide repeats in the fragile 
X syndrome and in myotonic dystrophy (Suthers et al.. 
1 992). The presence of an unstable, expandable trinucleo- 
tide repeat on HD chromosomes in the region of strongest 
linkage disequilibrium with the disorder suggests that this 
alteration underlies the dominant phenotype of HD and 
that IT15 encodes tho HD gene. 

Results 

Application of Exon Amplification to Obtain 

Trapped, Cloned Exons 

The HD candidate rogion defined by discrete recombina- 
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tha IT15 Transcript 
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336 ^JCS^CVTBKtWt''* 

496 llTEQPtSOHTlOA0St.OlA*».w*-'* 
576 T*SDSSKI v ^ ll0T[> 

2401 C7AACACCC£CAAAAAA 

eTcurLcTTecTen^^ 

AaOtLeMTLs7CIICI«T*el!SWCtTA«Itt7«^ 

jea. erc.cTers«(^«eTtcctATTc«Ttti«a<^ 

6S& LTLflKtStULV»T8lLtt'-"t* t ' r ' lL ** v 
3«1 CCOCACA7CACTCTAe4>AtUW7GTACC6Tr<WTtt^^ 

1056 AJOESR«SCTvtnAl>*I*. T '' l - ss * y ' Pls ' , ' > "' ,UUA *' ,v 

37?i ctcctccwcctccaccaktcitctctcaccicci^ 

3*1 MCCCttCITCTCTAACKCCATe^CCAWeCCC^ 

U76 ipPttSPIRBKCKEtePCtOAaVPLSPKCCStA&AASHa* 

3961 CATACCTCA0QTCCTQrTACAA<*ACTJU*7CCTCAW^ 

1116 0TSOPVtTS<IJJlCSFTHl.PSTlIIiKDVCKATllA«TCVT 

4M1 trC(UTn7aCUaC4UCCCAMAC77TC«C«77TC7M^^ 

1236 tOLa«STCJlCCfCBSAi.OVL5Qll£t*7LQ0!C<CVtEl 

4201 CTACCATACCTCAOTCCTCCTTTAflTCOACMC^ 

lCtCKSCFS«CPB»»ATVCV0Ol.t.KTV<'C7KLA5OFQGLSg 

4321 UCCCCACCAtfTOCMGCeCttCCACAGCCCCTTaCT^^ 

1J36 »p$CIQCt*alLCSSSVaPClTitTCfl»*P»TMFTQAlAOA 

Cttl AeCCTCAC£AACATCaTCCAMCOGACCAWOMMACACC7C«CATCC7ttCA7:TCCIC 

1376 Xl.a«MVOAtO£IIOT80yFOVLQ^virOLCI«LTSVTKItaA 

4*61 CATAACAATQCTAT7a7A*7CACATTCGTnGTTTCA>CCTC77C^^^ 

U16 Dr«A|HMKIRLFEPL*IKAl.<0TTTTTCV0VO<0VlOLLA 

4661 W«T«7TCAClTACC«TTMTTAC1CTCT1C7CQaTCA4M 

1456 OLVOlttVitClLDSOOVIICfVLCOf ETtEVCQfaESEAl 

4801 ATTC4>AACA7CT77TTCTTCTTCCTATTAC7ATCT7A7CAACCCTATtt^ 

K96 |PKffrfLVl»,STeilTIlSAQIlClPtllOLC0C!HA3CIC 

4921 CCinCAaCATCCOnACCCCCTCTCCACCCCATACrcCAOUKT^ 

1536 4vrnAiPAi.QpivtiOiFvcacriirAOAC«it.fcTo<€v*vs 

5041 ATCTTArTCAt>CTa7CUC7ACCMCACGrG7T<XJU>^^ 

1576 KkL8l.lOTIlOvLE*F!».V(.aoCI*tEKe0<M<ftl5Q0I » 0 I 

5161 ATCC7CCCAA7C1TACCCAAACAC£ACATCCACAUCAC7CTCATaAJlCCCCirCCACTC7TAJUTACAltAfTT^ 

1616 ICrillAKQgNMIOS»CAkCVl«rirE!t.APSflt»VO*lk 

3261 CQ4UCTATCT1CC1CACTCCAAACACMr«CCTCCC7QlCC*OCTT^ 

HJ6 48«fVT»ir«ASVSTV0lUIS6ll*ll>VLII0Sr<A*vi 

nrT »aa n-io riQTT TTTPT POHP . Rfi 
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maticof the composite sequence derived as described in 
the figure legend. Figure 3 also displays the locations on 
the composite sequence of the nine trapped exon clones. 

The composite sequence of 1T15, containing the entire 
predicted coding sequence, spans 10,366 bases, includ- 
ing a taU of 18 A's as shown In Figure 4. An open reading 
frame of 9432 bases begins with a potential initiator methi- 
onine codon at base 316, located in the context of an 
optimal translation initiation sequence. An in-frame stop 
codon is located 240 bases upstream of this site. The pro- 
tein product of IT1S is predicted to be a 348 kd protein 
containing 3144 amino acids. Although we have chosen 
the first Met codon in the long open reading frame as the 
probable initiator codon. we cannot exclude the possibility 
that translation does not actually begin at a more 3' Met 
codon, producing a smaller protein. 

Polymorphic Variation of the (CAG)„ 
Trinucleotide Repeat 

Near its 5' end, the IT15 sequence contains 21 copie9 of 
the triplet CAG, encoding glutamine (Figure 5). When this 
sequence was compared with our colloction of genomic 
sequences surrounding simple sequence repeats in 
4p!6.3. we found that normal cosmid L191 F1 had 18 cop- 
ies of the triplet, indicating that the (CAG) 0 repeat is poly- 
morphic (Figure 5). We chose primers from the genomic 
sequence flanking the repeal to establish a polymerase 
chain reaction (PCR) assay for this variation. In the normal 
population, this simple sequence repeat polymorphism 
displays at least 17 discrete alleles, ranging from about 
11 to 34 repeat units (Table i). Ninety-eight percent of the 
1 73 normal chromosomes tested contained repeat lengths 
between 1 1 and 24 repeats. Two chromosomes were de- 
tected in the 25-30 repeat range and 2 normal chromo- 
somes had 33 and 34 repeats, respectively. The overall 
heterozygosity on normal chromosomes was 80%. We 
presume, based on sequence analysis of three clones, 
that the variation is based entirely on the (CAG)„, but we 
cannot exclude the potential for variation of the smaller 
downstream (CCG)t. which is also included in the PCR 
product. 

Instability of the Trinucleotide Repeat 
on HD Chromosomes 

Sequence analysis of cosmid GUS72-2130, derived from 
a chromosome with the major HO haplotype (see below), 
revealed 48 copies of the trinucleotide repeat, far more 
than the number of copies in the largest normal allele (Fig- 
ure 5). When the PCR assay was applied to HD chromo- 
somes, a pattern strikingly different from the normal varia- 
tion was observed. HD heterozygotes contained one 
discrete allelic product in the normal size range and one 
PCR product of much larger size, suggesting that the 
(CAG)a repeat on HD chromosomes is expanded relative 
to norma! chromosomes. 

Figure 6 shows the patterns observed when we per- 
formed the PCR assay on lyrnphoblast ON A from a se- 
lected nuclear family in a large Venezuelan HD kindrod. 
In this family, DNA marker analysis has shown previously 
that the HD chromosome was transmitted from the father 
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NORMAL NORMAL HD 
COSMID cDNA COSMID 

(CAG) 18 (CAG) 21 (CAG)« 8 

Figure 5. ONA Sequence Analysis of the (CAG), Repeat 
DNA sequence shown in panels 1 . 2. and 3 demonstrates the variation 
in the (CAG). repeat detected in normal cosmid L191F1 (i), cDNA 
IT16C (2). and HD cosmid GUS72-2130. Panels 1 and 3 were gener- 
ated by direct sequencing of cosmid subclones using the primer 
5'-GGCGGGAGACCGCCATGGCG-3'. Panel 2 was generated using 
tho pBSKII T7 prim or 5'-AATaCGACTCaCTaTaG-3'. 



Table 1. Comparison of HD and Normal Repeat Length 

Normal HO 

Range of Allol© Chromos0rt1QS Chromosomes 

Sizes (Number 

of Repeats) Number Frequency Number Frequency 
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Figure 6. PCR Analysis of the (CAG)« Repeat in a Venezuelan HD 
SiDship with Some Offspring Displaying Juvenile Onset 
Results of PCR analysis of a sibship in the Venezuelan HD pedigree 
are shown. Affected individual* are represented by dosod symbols. 
Progeny are shown as triangles, and the birth order of some individuals 
has baen changed for confidentiality. AN1, AN2, and AN3 mark the 
positions of the allelic products from normal chromosomes. AE marks 
the range of PCR products from the HD chromosome. The intensity 
of background constant bands, which represent a useful reforonce for 
comparison of me above PCR products, varies with slignt differences 
in PCR conditions. The PCR products from cosmids L191F1 and 
GUS72-2130 are loaded in lanes 12 and 13 and have 18 and 48 CAG 
repeals, respectively. 



(lane 2) to seven children (lanes 3, 5, 6, 7, 8, 10. and 1 1). 
The three normal chromosomes present in this mating 
yielded a PCR product in the normal size range (AN1, 
AN2, and AN3) that was inherited in a Mendelian fashion. 
The HD chromosome in the father yielded a diffuse, fuzzy 
PCR product slightly smaller than the 46 repeat product 
of our non-Venezuelan HD cosmid. Except for the DNA 
in lane 5. which did not PCR amplify, and in lane 1 1 , which 
displayed only a single normal allele, each of the affected 
children's DNAs yielded a PCR product of a different size 
(AE), indicating instability of the HD chromosome (CAG) n 
repeat. Lane 6 contained an HD-specific product slightly 
smaller than or equal to that of the father's DNA. Lanes 
3. 7, 10, and 8, respectively, contained HO-specific PCR 
products of progressively larger size. The absence of an 
WD-specific PCR product in lane 11 suggested that this 
child's DNA possessed a (CAG) n repeat that was too long 
to amplify efficiently. This was verified by Southern blot 
analysis in which the expanded HD allele was easily de- 
tected and estimated to contain up to 100 copies of the 
repeat. Notably, this child had juvenile onset o< HD at the 
very early age of 2 years. The onset of HD in the father 
was when he was in his early 40s, typical of most adult HD 
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represented by lanes 3. 7, 10, and 8 were 26, 25, 14, 
and 1 1 years, respectively, suggesting a rough correlation 
between age at onset of HD and the length of the (CAG)n 
repeat on the HD chromosome. In keeping with This trend, 
the offspring represented in lane 6 with the fewest repeats 
has reached adulthood without showing symptoms of the 
disorder. 

Figure 7 shows PCR analysis for a second sibship from 
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Figure 7. PCR Analysis of the (CAG). Repeat In a Venezuelan HD Sibship with Offspring Homozygous for the Same HD Haplotype 
Reoulte of PCR analysis of a clhship from tho Venezuelan HO pedigree in which both parents are afloctod by HD are shown. Progeny aro ohown 
as triangle* and birth order has been altered for confidently Ity. No HO diagnostic information io given to preserve tho blind ctatus of investigators 
in the Venezuelan Collaborative Group. ANi and AN2 mark the positions of the allelic products from normal parental enromooomea. AE marks 
tho rango of PCR productc from tho HO chromocomo. Tho PCR productc from coemids L191P1 and GUS72-2130 ar» lo&Cfld In lanes 29 3hd 30 
and have 18 and ^6 CAG repeats, respectively. 
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the Venezuelan pedigree, in which both parents are HD 
heterozygotes carrying the same HD chromosome based 
on DNA marker studies. Several of the offspring are HD 
homozygotes (lanes 6 and 7. 10 and 11. 13 and 14, 17 
and 18, 23 and 24) as reported previously (Wexler et al. ( 
1987). Each parent's DNA contained 1 allele in the normal 
range (AN 1 and AN2), which was transmitted in a Mende- 
Han fashion. The HD-specific products (AS) from the DNA 
of both parents and children were all much larger than the 
normal allelic products and also showed extensive varia- 
tion in mean size. We have not provided a neurologic diag- 
nosis for the offspring In this pedigree to maintain the blind 
status of investigators Involved in the ongoing Venezuela 
HD Project, although age of onset again appears to paral- 
lel repeat length. Paired samples under many of the indi- 
vidual symbols represent independent lymphoblast lines 
initiated at least 1 year apart. The variance between paired 
samples was not as great as between the different individ- 
uals, suggesting that the major differences in sizo of the 
PCR products resulted from meiotic transmission. Of spe- 
cial note is the result obtained in lanes 13 and 14. This 
HD homozygote's DNA yielded one PCR product larger 
and one smaller than the HD-specitic PCR products of 
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To date, we have tested 75 independent HO families, 
representing all different haplotypes reported by MacDon- 
ald et al. (1992) and a wide range of ethnic backgrounds. 
In all 75 cases, a PCR product larger than the normal size 
range was produced from the HD chromosome. The sizes 
of the HD-specific products ranged from 42 repeat copies 
to more than 66 copies, with a few individuals failing to 
yield a product because of the extreme length of the re- 
peat. In these cases. Southern blot analysis revealed an 
increase in the length of an EcoRl fragment, with the 
largest allele approximating 1 00 copies of the repeat. Fig- 
ure 8 shows the variation detected in members ot an Amer- 
ican family of Irish ancestry in which the major HD haplo- 
type is segregating. Cosmid GUS72-21 30 was cloned from 
the HD homozygous individual whose DNA was amplified 
In lane 2. As was observed in the Venezuelan HD pedigroo 
(Figures 6 and 7). which segregates the disorder with a 
different 4pi 6.3 haplorype, the tfD-speciflc PCR products 
for this family display considerable size variation. 

New Mutations to HD? 

The mutation rate in HD has been reported to be very low. 
To test whether the expansion of the (CAG)„ repeat is the 
mechanism by which new HD mutations occur, we have 
examined two pedigrees with sporadic cases of HD in 
which intensive searching failed to reveal a family history 
of the disorder. In these cases, we gathered pedigree infor- 
mation sufficient to identify the same chromosomes in 
both the affected individual and unaffected relatives. Fig- 
ure 9 shows the results of PCR analysis of the (CAG) n 
repeat in these families. The chromosomes in each family 
were assigned an arbitrary number based on typing for a 
large number of restriction fragment length polymorphism 
and simple sequence repeat markers in 4pl6.3 defining 
distinct haplotypes; the presumed HD chromosome is 
starred. 
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Figure 8. PCR Analysis ot ihe (CAG)* Repeat in Membersof an Amei> 
can Family with an Individual Homozygouo for the Major HO HaploiyDe 
Results of PCR analysis of members of an American family segregat- 
ing the major HO heplotype. AN marks mo range of normal alleles; 
AE marks me range of HO aJleles. Lanes 1 . 3. 4, 5, 7. and 8 represent 
PCR products from related HD heterorygotes. Una 2 contains the 
PCR products from a member of the family homozygous ^ r lh9 same 
WDchromosome. Lane 6 contains PCR products from a normal Individ- 
ual. Pedigree relationships and affected siatus are not presented to 
preservo confidentiality. The PCR products from cosmids L191 Ft and 
riiiQ72-21-30 Mi^h u»as dc.'lvod !n?rr> »»» In^vidua! represents «.n 
lane Z) are loaded in lonos 9 and 10 and nave 18 and 48 CAG repeats, 
rosooctiwoiy. 



In family 1 , H D first appeared in individual ll-3, who trans- 
mitted the disorder, along with chromosome 3*, to 111-1. 
This same chromosome was present in II-2, an elderly 
unaffected individual. PCR analysis revealed that chromo- 
some 3* from IU2 produced a PCR product at the extreme 
high end of the normal range ( - 36 CAG copies). However, 
the (CAG)„ repeat on the same chromosome in ll<3 and 
MM had undergone sequential expansions to -44 and 
-46 copies, respectively. A similar result was obtained 
in family 2, where the presumed new HD mutant HI-2 had 
a considerably expanded repeat relative to the same chro- 
mosome in M-1 and 111-1 (-49 versus -33 CAG copies). 
In both families 1 and 2. the ultimate HD chromosome 
displays the marker haplotype characteristic of one-third 
of an HD chromosomes, suggesting that this haplotype 
may be predisposed to undergoing repeat expansion. 

Diocusslon 

The discovery of an expanded, unstable trinucleotide re- 
peat on HD chromosomes suggests that the long-sought 
HD gone has at last been uncovered and that the disorder 
constitutes an example of a mutational mechanism that 
may prove quite common in human genetic disease. Elon- 
gation of a trinucleotide repeat sequence has been impli- 
* cated previously as the cause of three quite different human 
disorders, the fragile X syndrome, myotonic dystrophy, and 
spino-bulbar muscular atrophy. Our initial observations 
of repeat expansion in HD indicate that this phenomenon 
shares features with each of these disorders. 

In the fragile X syndrome, expression of a constellation 
of symptoms, including mental retardation and a fragile 
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Figure 9. PCR Analysis of the (CAG), Ropeal in Two Families wiln a 
Supposed New Mutation Causing HD 

Results of PCR analysis ol two families in which sporadic HD cases 
representing putative now mutants are snown. Individuals in each pedi- 
gree are numbered by generation (roman numerals) and oroer In the 



site at Xq27.3. is associated with expansion of a (CGG)„ 
repeat thought to be in the 5' untranslated region of the 
FMRl gene (Fu et al.. 1 991 ; Kremer et al. , 1 99i ; Verkerk 
et al.. 1991). In myotonic dystrophy, a dominant disorder 
involving muscle weakness with myotonia that typically 
presents in early adulthood, the unstable trinucleotide re- 
peat, (CTG)n, is located in the 3' untranslated region of 
the myotonin protein kinase gene (Aslanidis et al., 1992; 
Brook et al., 1992; Buxton et al., 1992; Fu et al. t 1992; 
Harley ot al.. 1992a; Mahadevan et al., 1992). The unstable 
(CAG)n repeat in HD may be within the coding sequence 
of the IT15 gene, a feature shared with spino-bulbar mus- 
cular atrophy, an X-linked recessive adult-onset disorder 
of the motor neurons caused by expansion of a (CAG)„ 
repeat in the coding sequence of the androgen receptor 
gene (LaSpada el al., 1991). The repeat length in both 
the fragile X syndrome and myotonic dystrophy tends to 
increase in successive generations, sometimes quite dra- 
matically. Occasionally, decreases in the average repeat 
length are observed (Fu et al. ,1991; Yu et al., 1 992; Bruner 
et al., 1993). The HD trinucleotide repeat is also unstable, 
usually expanding when transmitted to the next genera- 
tion, but contracting on occasion. In HD. as in the other 
disorders, change in copy number occurs in the absence 
of recombination. Compared with the fragile X syndrome, 
myotonic dystrophy, and HD, the instability of the disease 
allele in spino-bulbar muscular atrophy is more limited, 
and dramatic expansions of repeat length have not been 
seen (Biancalana et al., 1992). 

Expansion of the repeat length in myotonic dystrophy 
is associated with a particular chromosomal haplotype, 
suggesting the existence of a primordial predisposing mu- 
tation (Harley et al., 1991 . 1992a; Ashizawa, and Epstein, 
1991). In the fragile X syndrome, there may be a limited 
number of ancestral mutations that predispose increases 
in trinucleotide repeat number (Richards et al., 1992; 
Oudet et al., 1993). The linkage disequilibrium analysis 
used to home in on IT15 indicates that there are several 
haplotypes associated with HD, but that at least one-third 
of HD chromosomes are ancestrally related (MacDonald 
et al., 1992). These data, combined with the reported low 
rate of new mutation to HD (Harper, 1992), suggest that 
expansion of the trinucleotide repeat may only occur on 
select chromosomes. Our analysis of two families In which 
new mutation was supposed to have occurred is consis- 
tent with the view that there may be particular normal chro- 
mosomes that have the capacity to undergo expansion of 
the repeat into the HD range. In each of these families, a 
chromosome with a(CAG) n repeat length in the upper end 
of the normal range was segregating on a chromosome 



pedigree. Triangles are used to protect confidentiality. Closed symbols 
indicate symptomatic Individuals. The different chromosomes segre- 
gating in the pedigree have been distinguished by extensive typing 
with polymorphic markors In 4pi 6.3 and have been assigned arbitrary 
numbers shown aDove the gel lanes. The starred chromosomes (chro- 
mosome 3 in |A1 and 1 in [6]) represent the presumed HD chromosome. 
AN denotes the range of normal alleles; AE denotes the range of alleles 
proBent in affoctod Individuals and in their. unaffocted relatives bearing 
ine same chromosome. 
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whose 4pl6.3 haplotype matched the most common hap- 
lotype seen on HO chromosomes, and the clinical appear- 
ance of HD in these two cases was associated with expan- 
sion of the trinucleotide repeat. 

The recent application of haplotype analysis to explore 
the linkage disequilibrium on HD chromosomes pointed 
to a portion of a 2.2 Mb candidate region defined by the 
majority of recombination events described in HD pedi- 
grees (MacDonald et al, 1992). Previously, the search for 
the gene was confounded by three mating9 in which the 
genetic inheritance pattern was inconsistent with the re- 
mainder of the family (MacDonald el al., 1989b; Pritchard 
et al,, 1 992). These matings produced apparently affected 
HD individuals despite the inheritance of only normal al- 
leles for markers throughout 4pl6.3. effectively excluding 
inheritance of the HD chromosome present in the rest of 
the pedigree. Using our PCR assay, we have tested each 
of these families and find that, like other HD kindreds, an 
expanded allele generally segregated with HD in affected 
individuals of all three pedigrees. However, an expanded 
allele was not present in those specific individuals with the 
inconsistent 4pi 6.3 genotypes. Instead, these individuals 
displayed the normal alleles expected, based on analysts 
of other markers in 4p16.3. It is conceivable that these 
inconsistent individuals do not, in fact, have HD, but some 
other disorder. Alternatively, they might represent genetic 
mosaics in which the HD allele is more heavily represented 
and/or more expanded in brain tissue than in the 
lymphoblast DNA used for genotyplng. 

It can be expected that the capacity to monitor directly 
the size of the trinucleotide repeat in individuals "at risk" 
for HD will revolutionize preclinical testing for the disorder, 
eliminating the need for complicated linkage analyses, fa- 
cilitating genetic counseling, and extending the applicabil- 
ity of presymptomatic and prenatal diagnosis to at risk 
individuals with no living affected relatives. We consider 
it of the utmost importance that the current internationally 
accepted guidelines and counseling protocols for testing 
those at risk continue to be observed, and that samples 
from unaffected relatives should not be tested inadver- 
tently or without full consent. In our limited initial series of 
patients, there is an apparent correlation between repeat 
length and age of onset of the disease, reminiscent of 
that reported in myotonic dystrophy (Harley et al.. 1992b; 
Tsilfidis et al., 1992). The largest HO trinucleotide repeat 
segments were found in juvenile onset cases, where there 
is a known preponderance of male transmission (Merrit 
et al., 1969). More detailed studies will bo required to es- 
tablish whether expansion of the repeat occurs preferen- 
tially in transmission from males. It will also be essential 
to perform a careful analysis of the extent, if any, of overlap 
between the range of repeat lengths in normal and HD 
individuals, to evaluate fully the relationship between age 
of onset and repeat length, and to examine the possibility 
of somatic variation in repeat length due to mitotic instabil- 
ity. These studies must be completed before the (CAG)„ 
size Is used to provide prognostic information to at risk 
HD individuals. 

The expression of fragile X syndrome is associated with 
direct inactivation of the FMR1 gene (Piererti et al., 1991; 



DeBoulte et al. ( 1993). The recessive inheritance pattern 
of spino-bulbar muscular atrophy suggests that in this 
disorder an inactive gene product is produced. In myotonic 
dystrophy, the manner in which repeat expansion leads 
to the dominant disease phenotype is unknown. There are 
numerous possibilities for the mechanism of pathogenesis 
of the expanded trinucleotide repeat in HD, Since Wolf- 
Hirschhorn patients hemizygous for 4pl 6.3 do not display 
features of HD and IT15 mRNA is present in HD homozy- 
gotes. the expanded trinucleotide repeat does not cause 
simple inactivation of the gene containing it. The observa- 
tion that the phenotype of HD is completely dominant, 
since homozygotes for the disease allele do not differ clini- 
cally from heterozygotes. has suggested that HD results 
from a gain-of-function mutation, in which either the mRNA 
product or the protein product of the disease allele would 
have some new property or would be expressed inappro- 
priately (Wexler et al., 1987; Myers et al., 1989). If the 
expanded trinucleotide repeat were translated, the conse- 
quences on the protein product would be dramatic, in- 
creasing the length of the poly-glutamine stretch near the 
N-termlnus. It is possible however, that despite the pres- 
ence of an upstream Met codon, the normal translational 
start occurs 3' to the (CAG).- repeat a.nd there is r,c pcly-g'u- 
tamine stretch in the protein product. In this case, the 
repeat would be in the 5' untranslated region and might 
be expected to have its dominant effect at the mRNA level. 
The presence of an expanded repeat might directly after 
regulation, localization, stability, or trans l a lability of the 
mRNA containing it. and could indirectly affect its counter- 
part from the normal allele in HO heterozygotes. Other 
concoivable scenarios are that the presence of an ex- 
panded repeat might alter the effective translation start 
site for the HD transcript, thereby truncating the protein, 
or alter the transcription start site for the IT15 gene, dis- 
rupting control of mRNA expression. Finally, although the 
repeat is located within tho IT15 transcript, the possibility 
that it leads to HD by virtue of an action on the expression 
of an adjacent gene cannot be excluded. 

Despite this final caveat, we believe it most likely that 
the trinucleotide repeat expansion causes HD by its effect, 
either at the mRNA or protein level, on the expression 
and/or structure of the protein product of the IT15 gene, 
which we have named huntingtin. Outside of the region 
of the triplet repeat, the IT15 DNA sequence detected no 
significant similarity to any previously reported gene in the 
GenBank data base. Except for the stretches of glutamine 
and proline near the N-terminus, the amino acid sequence 
displayed no similarity to known proteins, providing no 
conspicuous clues to huntingtins function. The poly-glu- 
tamine and poly-proline regions near the N-terminus indi- 
cate similarity to a large number of proteins that also con- 
tain long stretches of these amino acids. It is difficult to 
assess the significance of such similarities, although it is 
notable that many of these similarities are to DNA-binding 
proteins and that huntingtin does have a single leucine 
zipper motif at residue 1443. Huntingtin appears to be 
widely expressed, yet coll death in HD is confined to spe- 
cific neurons in particular regions of the brain. Thus, with 
the mystery of the genetic basis of HD apparently solved. 
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defining the normal function o< the huntingtin protein and 
delineating the mechanism whereby increased trinucleo- 
tide repeat length leads to the characteristic neuropathol- 
ogy of HD represent the next challenges in the effort to 
understand and to treat this devastating disorder. 

Experimental Procedures 

HO Cell Lines 

Lymphobtast cell lines from HO families of varied ethnic backgrounds 
used for genetic linkage and disequilibrium studies (Conneally et al.. 
1989; MacDonald et el., 1992) have been established (Anderson and 
Gusella. 1984) in the Molecular Neurogenetics Unit, Massachusetts 
General Hospital, overine past i3yeans. The Venezuelan HD pedigree 
is an extended kindred of over 12,000 members, in which ell affected 
individuals have inherited the HD gene from a common founder (Gu- 
sella et a!., 1983. 1984; wexior et al., 1907). 

DNA end RNA Blotting 

ONA was prepared from cultured cells, and DNA blots were prepared 
and hybridized as described (Gusella et al.. 1979, 1983). RNA was 
prepared and Northern blotting was porformod as described by Taylor 
et al. (1992). 

Conotructlon of Cosmld Contlg 

The initial construction of the eosmid contlg was by chromosome walk- 
ing from cosmids L19 end BJG6 (Atlitto et el., 1991 ; Un et al., 1991)- 
Two llororios wore omployed, a collection of Alu-positive cosmids from 
(he reduced cellhybrid H39-8Ci0(Whalevetal.. 1 991 land an arrayed 
flow-sorted chromosome 4 cosmld library (NM87545) provided by the 
Los Alamos National Laboratory. Walking was accomplished by hy- 
bridization of whole eosmid DNA, using suppression of repetitive and 
vector sequences, to robot-generated high density liilor grids (Nlzeiic 
«t al., 1991: Lehrach et al., 1990). Cosmids LiC2. L69F7, L22866, 
and L83D3 were first identified by hybridization of yoost artificial chro- 
mosome clone YGA2 to me same arrayed library (Bates et el.. 1992: 
Baxendaleetal., 1991). HD cosmld GUS72-21 30 was Isolated by stan- 
dard screening of a GUS72 eosmid library using a single-copy probe. 
Cosmid overlaps were confirmed by a combination of clone to clone 
and dene to genomic hybridizations, single-copy probe hybridizations, 
and restriction mapping. 

cDNA Isolation and Characterization 

Exon probes were isolated and cloned as described (Buckler et al.. 
1991). Exon proboa and cDNAs were used to screen human X7APII 
cDNA libraries constructed from Adult frontal cortex, f otai brain, edeno- 
virus-transformed rotinal coll line RCA. and liver RNA. cDNA clones, 
PCR products, and trapped exons were eequencod as described 
(Sangor et al., 1 977). Direct cosmid sequencing was performed as 
described (McClatchey et at., 1992). Data base searches were per- 
formed using the BLAST network service of the National Center for 
Biotechnology Information (Altschul et al.. 1990). 

PCR Assay of the (CAG)„ Repeat 

Genomic primers flanking the (CAG)„ repeat are 5'-ATGAAGG- 
CCTTCG AGTCCCTC AAGTCCTTC-3' and 5-AAACTCACGGTCGGT- 
GCAGCGGCTCCTCAG-3'. PCR amplification was performed in a re- 
action volume of 25 u.1 using SO ng of genomic DNA, 5 ug of each 
primor. 10 mM Tri9 (pH 8.3), 5 mM KCI, 2 mM MgCt, 200 uM (each) 
dNTPe, 10% dimethylsutfoxide, 0.1 U of Perfoctmatch (Siretegene). 
2.5 nCI of pPJdCTP (Amersham), and 1.25 U of Taq polymerase 
(Boehringer Mannheim). After heating to 94°C for 1 .5 mln, the reaction 
mix was cycled according to the following program: 40 cycles of i min 
at 94*C, i min at 60°C. 2 min at 72°C. Five microliters of each PCR 
was diluted with an equal volume of 95% formamide loading dye end 
heal denatured for 2 min at 95°C. Tne products were resolved on 5% 
denaturing polyacryiamide gols. Tho PCR product from this reaction 
using cosmid Li9iF1 (CAG,o as the templato was 247 bp. Allole sizes 
were estimated relative to a DNA sequencing ladder, the PCR products 
from eequenced cosmids. and the Invariant background bands often 
present on the gel. Estimates of allelic variation were obtained by 
typing unrelated individuals of largely Western European ancestry. 



who were normal parents of affected HD individuals from various pedi- 
grees. 
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